Skip to content

Benchmark Results - Overview

Here we collect the results of the living BioChatter benchmark. For an explanation, see the benchmarking documentation and the developer docs for further reading.

Scores per model

Table sorted by median score in descending order. Click the column names to reorder.

Model name Size Median Accuracy SD
gpt-3.5-turbo-0125 175 0.87 0.21
gpt-4-turbo-2024-04-09 Unknown 0.83 0.3
gpt-4-0613 Unknown 0.78 0.18
claude-3-opus-20240229 Unknown 0.77 0.28
gpt-3.5-turbo-0613 175 0.76 0.21
claude-3-5-sonnet-20240620 Unknown 0.76 0.28
llama-3.1-instruct 8 0.74 0.28
gpt-4-0125-preview Unknown 0.73 0.3
llama-3.1-instruct 70 0.73 0.29
gpt-4o-2024-05-13 Unknown 0.73 0.35
openhermes-2.5 7 0.7 0.32
gpt-4o-mini-2024-07-18 Unknown 0.7 0.27
llama-3-instruct 8 0.65 0.36
chatglm3 6 0.44 0.26
llama-2-chat 70 0.42 0.34
mistral-instruct-v0.2 7 0.4 0.33
code-llama-instruct 7 0.4 0.35
code-llama-instruct 13 0.38 0.33
code-llama-instruct 34 0.38 0.35
llama-2-chat 13 0.38 0.33
llama-2-chat 7 0.34 0.31
mixtral-instruct-v0.1 46,7 0.34 0.28

Scatter Quantisation Name Boxplot Model

Scores per quantisation

Table sorted by median score in descending order. Click the column names to reorder.

Model name Size Version Quantisation Median Accuracy SD
gpt-3.5-turbo-0125 175 nan nan 0.87 0.21
gpt-4-turbo-2024-04-09 Unknown nan nan 0.83 0.3
gpt-4-0613 Unknown nan nan 0.78 0.18
claude-3-opus-20240229 Unknown nan nan 0.77 0.28
claude-3-5-sonnet-20240620 Unknown nan nan 0.76 0.28
gpt-3.5-turbo-0613 175 nan nan 0.76 0.21
llama-3.1-instruct 8 ggufv2 Q3_K_L 0.75 0.29
llama-3.1-instruct 8 ggufv2 Q8_0 0.75 0.28
llama-3.1-instruct 8 ggufv2 IQ4_XS 0.75 0.3
llama-3.1-instruct 8 ggufv2 Q6_K 0.74 0.28
llama-3.1-instruct 70 ggufv2 IQ2_M 0.74 0.29
llama-3.1-instruct 8 ggufv2 Q5_K_M 0.74 0.28
llama-3.1-instruct 8 ggufv2 Q4_K_M 0.74 0.28
gpt-4-0125-preview Unknown nan nan 0.73 0.3
gpt-4o-2024-05-13 Unknown nan nan 0.73 0.35
llama-3.1-instruct 70 ggufv2 IQ4_XS 0.73 0.29
openhermes-2.5 7 ggufv2 Q5_K_M 0.73 0.32
openhermes-2.5 7 ggufv2 Q8_0 0.71 0.32
openhermes-2.5 7 ggufv2 Q4_K_M 0.71 0.33
gpt-4o-mini-2024-07-18 Unknown nan nan 0.7 0.27
openhermes-2.5 7 ggufv2 Q6_K 0.7 0.33
llama-3.1-instruct 70 ggufv2 Q3_K_S 0.67 0.28
llama-3-instruct 8 ggufv2 Q8_0 0.65 0.35
llama-3-instruct 8 ggufv2 Q4_K_M 0.65 0.38
llama-3-instruct 8 ggufv2 Q6_K 0.65 0.36
llama-3-instruct 8 ggufv2 Q5_K_M 0.62 0.36
openhermes-2.5 7 ggufv2 Q3_K_M 0.59 0.32
openhermes-2.5 7 ggufv2 Q2_K 0.51 0.3
code-llama-instruct 34 ggufv2 Q2_K 0.5 0.33
code-llama-instruct 7 ggufv2 Q3_K_M 0.49 0.31
code-llama-instruct 7 ggufv2 Q4_K_M 0.47 0.39
mistral-instruct-v0.2 7 ggufv2 Q5_K_M 0.46 0.34
mistral-instruct-v0.2 7 ggufv2 Q6_K 0.45 0.34
code-llama-instruct 34 ggufv2 Q3_K_M 0.45 0.31
chatglm3 6 ggmlv3 q4_0 0.44 0.26
llama-2-chat 70 ggufv2 Q5_K_M 0.44 0.35
llama-2-chat 70 ggufv2 Q4_K_M 0.44 0.35
code-llama-instruct 13 ggufv2 Q6_K 0.44 0.35
code-llama-instruct 13 ggufv2 Q8_0 0.44 0.33
code-llama-instruct 13 ggufv2 Q5_K_M 0.43 0.32
llama-2-chat 70 ggufv2 Q3_K_M 0.41 0.33
mistral-instruct-v0.2 7 ggufv2 Q3_K_M 0.41 0.34
mistral-instruct-v0.2 7 ggufv2 Q8_0 0.4 0.33
llama-2-chat 13 ggufv2 Q8_0 0.4 0.34
code-llama-instruct 7 ggufv2 Q8_0 0.4 0.37
code-llama-instruct 7 ggufv2 Q5_K_M 0.39 0.34
llama-2-chat 13 ggufv2 Q3_K_M 0.39 0.33
llama-2-chat 13 ggufv2 Q5_K_M 0.39 0.33
code-llama-instruct 7 ggufv2 Q2_K 0.38 0.29
code-llama-instruct 34 ggufv2 Q4_K_M 0.38 0.35
code-llama-instruct 7 ggufv2 Q6_K 0.38 0.39
code-llama-instruct 34 ggufv2 Q5_K_M 0.38 0.38
llama-2-chat 70 ggufv2 Q2_K 0.38 0.35
llama-2-chat 13 ggufv2 Q6_K 0.37 0.34
code-llama-instruct 34 ggufv2 Q8_0 0.37 0.35
llama-2-chat 7 ggufv2 Q4_K_M 0.37 0.29
mistral-instruct-v0.2 7 ggufv2 Q2_K 0.37 0.29
mistral-instruct-v0.2 7 ggufv2 Q4_K_M 0.37 0.35
code-llama-instruct 34 ggufv2 Q6_K 0.37 0.36
llama-2-chat 13 ggufv2 Q4_K_M 0.36 0.34
llama-2-chat 7 ggufv2 Q3_K_M 0.36 0.34
mixtral-instruct-v0.1 46,7 ggufv2 Q4_K_M 0.35 0.3
llama-2-chat 7 ggufv2 Q8_0 0.35 0.29
mixtral-instruct-v0.1 46,7 ggufv2 Q5_K_M 0.34 0.31
mixtral-instruct-v0.1 46,7 ggufv2 Q6_K 0.34 0.29
mixtral-instruct-v0.1 46,7 ggufv2 Q3_K_M 0.33 0.28
code-llama-instruct 13 ggufv2 Q4_K_M 0.33 0.31
llama-2-chat 7 ggufv2 Q6_K 0.33 0.29
mixtral-instruct-v0.1 46,7 ggufv2 Q8_0 0.33 0.25
llama-2-chat 7 ggufv2 Q5_K_M 0.32 0.29
mixtral-instruct-v0.1 46,7 ggufv2 Q2_K 0.32 0.27
llama-2-chat 13 ggufv2 Q2_K 0.28 0.29
llama-2-chat 7 ggufv2 Q2_K 0.22 0.36
code-llama-instruct 13 ggufv2 Q2_K 0.17 0.34
code-llama-instruct 13 ggufv2 Q3_K_M 0.15 0.34

Boxplot Quantisation

Scores of all tasks

Wide table; you may need to scroll horizontally to see all columns. Table sorted by median score in descending order. Click the column names to reorder.

Full model name multimodal_answer property_selection sourcedata_info_extraction relationship_selection entity_selection api_calling query_generation medical_exam implicit_relevance_of_multiple_fragments naive_query_generation_using_schema end_to_end_query_generation explicit_relevance_of_single_fragments property_exists Mean Accuracy Median Accuracy SD
gpt-3.5-turbo-0125 nan 0.35625 0.510032 1 1 0.746479 0.966667 0.670401 0.9 0.486667 0.926667 1 0.866667 0.785819 0.866667 0.211361
gpt-4-turbo-2024-04-09 0.99 0.303125 0.650369 0 1 nan 0.826667 0.839506 1 0.5 0.6 1 1 0.725806 0.826667 0.301411
gpt-4-0613 nan 0.359375 0.668903 0.65 0.888889 0.619048 0.966667 0.730159 1 0.68 0.88 1 0.888889 0.777661 0.777661 0.177558
claude-3-opus-20240229 nan 0.421875 0.691235 0 1 nan 0.944444 0.805556 1 0.733333 0.655556 0.833333 1 0.73503 0.770293 0.276411
claude-3-5-sonnet-20240620 nan 0.375 0.756088 0 1 nan 0.966667 0.7737 1 0.633333 0.733333 1 0.866667 0.736799 0.764894 0.283502
gpt-3.5-turbo-0613 nan 0.3625 0.575381 0.5 0.888889 nan 0.946667 nan 1 0.5 0.833333 1 0.755556 0.736233 0.755556 0.211926
llama-3.1-instruct:8:ggufv2:Q3_K_L nan 0.421875 0.360379 0 1 nan 0.933333 0.768421 0.833333 0.566667 0.733333 1 0.833333 0.677334 0.750877 0.285133
llama-3.1-instruct:8:ggufv2:Q8_0 nan 0.515625 0.38907 0 1 nan 0.933333 0.765734 1 0.633333 0.733333 1 0.833333 0.709433 0.749534 0.284792
llama-3.1-instruct:8:ggufv2:IQ4_XS nan 0.359375 0.414621 0 1 nan 0.944444 0.756944 0.833333 0.633333 0.733333 1 1 0.697762 0.745139 0.295214
llama-3.1-instruct:8:ggufv2:Q6_K nan 0.46875 0.394469 0 1 nan 0.955556 0.751748 0.833333 0.633333 0.733333 1 0.833333 0.69126 0.742541 0.278015
llama-3.1-instruct:70:ggufv2:IQ2_M nan 0.328125 0.626498 0 1 nan 0.955556 0.772881 1 0.633333 0.6 1 0.916667 0.712096 0.742489 0.293676
llama-3.1-instruct:8:ggufv2:Q5_K_M nan 0.4375 0.380477 0 1 nan 0.933333 0.749117 0.833333 0.7 0.733333 1 0.833333 0.690948 0.741225 0.279278
llama-3.1-instruct:8:ggufv2:Q4_K_M nan 0.453125 0.382027 0 1 nan 0.933333 0.741935 0.833333 0.633333 0.733333 1 0.75 0.67822 0.737634 0.275701
gpt-4-0125-preview nan 0 0.689705 0.75 0.777778 0.793651 0.833333 0.77591 0.5 0.44 0 1 0.733333 0.607809 0.733333 0.295129
gpt-4o-2024-05-13 0.96 0 0.653946 0 1 0.809524 0.8 0.762838 0.7 0.533333 0 1 0.85 0.620742 0.731419 0.351091
llama-3.1-instruct:70:ggufv2:IQ4_XS nan 0.375 0.699238 0 1 nan 0.955556 0.822581 1 0.566667 0.6 1 0.75 0.706276 0.728138 0.285226
openhermes-2.5:7:ggufv2:Q5_K_M nan 0.125 0.579916 1 0.888889 nan 0.913333 0.571429 1 0.586667 0 1 0.777778 0.676637 0.727208 0.318593
openhermes-2.5:7:ggufv2:Q8_0 nan 0.125 0.600829 1 0.888889 nan 0.88 0.577031 1 0.466667 0 1 0.755556 0.663088 0.709322 0.319919
openhermes-2.5:7:ggufv2:Q4_K_M nan 0.046875 0.597281 1 0.888889 nan 0.873333 0.586368 1 0.466667 0 1 0.755556 0.655906 0.705731 0.330932
gpt-4o-mini-2024-07-18 0.98 0.365625 0.684553 0 1 0.714286 0.966667 0.840498 0.5 0.533333 0.66 0.833333 0.925 0.692561 0.703423 0.267084
openhermes-2.5:7:ggufv2:Q6_K nan 0.046875 0.619167 1 1 nan 0.86 0.57423 1 0.533333 0 1 0.733333 0.669722 0.701528 0.334697
llama-3.1-instruct:70:ggufv2:Q3_K_S nan 0.375 0.642336 0 1 nan 0.966667 0.8 1 0.633333 0.6 1 0.625 0.694758 0.668547 0.28438
llama-3-instruct:8:ggufv2:Q8_0 nan 0.28125 0.188555 0 0.875 nan 0.92 0.640669 1 0.666667 0 1 0.725 0.572467 0.653668 0.35454
llama-3-instruct:8:ggufv2:Q4_K_M nan 0.109375 0.116871 0 0.861111 nan 0.92 0.624884 1 0.666667 0 1 0.775 0.552173 0.645775 0.376754
llama-3-instruct:8:ggufv2:Q6_K nan 0.28125 0.162657 0 0.875 nan 0.926667 0.623955 1 0.666667 0 1 0.775 0.573745 0.645311 0.359165
llama-3-instruct:8:ggufv2:Q5_K_M nan 0.1875 0.166434 0 0.875 nan 0.926667 0.635097 1 0.6 0 1 0.65 0.549154 0.617549 0.360565
openhermes-2.5:7:ggufv2:Q3_K_M nan 0.125 0.554488 1 1 nan 0.94 0.563959 0.5 0.466667 0 1 0.72 0.624556 0.594257 0.318981
openhermes-2.5:7:ggufv2:Q2_K nan 0 0.444054 0.5 0.555556 nan 0.94 0.537815 0.5 0.433333 0 1 0.844444 0.5232 0.5116 0.298404
code-llama-instruct:34:ggufv2:Q2_K nan 0 nan 0.5 0 nan 0.686667 nan 1 0.566667 0 0.5 0.75 0.444815 0.5 0.328199
code-llama-instruct:7:ggufv2:Q3_K_M nan 0 nan 0.25 0.5 nan 0.873333 nan 0.7 0.426667 0 0.833333 0.8 0.487037 0.493519 0.307716
code-llama-instruct:7:ggufv2:Q4_K_M nan 0 0.138732 0 0.333333 nan 0.966667 nan 1 0.653333 0 1 0.6 0.469207 0.469207 0.38731
mistral-instruct-v0.2:7:ggufv2:Q5_K_M nan 0 0.385754 0 0.444444 nan 0.826667 0.364146 1 0.466667 0 1 0.688889 0.470597 0.455556 0.34385
mistral-instruct-v0.2:7:ggufv2:Q6_K nan 0.046875 0.367412 0 0.5 nan 0.833333 0.366947 1 0.433333 0 1 0.65 0.472536 0.452935 0.337974
code-llama-instruct:34:ggufv2:Q3_K_M nan 0 nan 0.25 0 nan 0.786667 nan 0.5 0.6 0 0.5 0.875 0.390185 0.445093 0.306514
chatglm3:6:ggmlv3:q4_0 nan 0.2875 0.188284 0.4 0.75 nan 0.553333 0.426704 1 0.48 0 0.733333 0.275 0.463105 0.444905 0.260423
llama-2-chat:70:ggufv2:Q5_K_M nan 0 0.210166 0.25 0.444444 nan 0.906667 nan 0.9 0.36 0 1 0.777778 0.484905 0.444444 0.346535
llama-2-chat:70:ggufv2:Q4_K_M nan 0 0.240936 0.25 0.444444 nan 0.92 nan 1 0.42 0 1 0.755556 0.503094 0.444444 0.354692
code-llama-instruct:13:ggufv2:Q6_K nan 0 nan 0 0 nan 0.793333 nan 0.5 0.54 0 0.833333 0.825 0.387963 0.443981 0.345581
code-llama-instruct:13:ggufv2:Q8_0 nan 0 nan 0 0 nan 0.766667 nan 0.5 0.566667 0 0.833333 0.75 0.37963 0.439815 0.334971
code-llama-instruct:13:ggufv2:Q5_K_M nan 0 nan 0 0 nan 0.78 nan 0.5 0.566667 0 0.666667 0.775 0.36537 0.432685 0.320506
llama-2-chat:70:ggufv2:Q3_K_M nan 0.171875 0.197898 0 0.333333 nan 0.906667 nan 0.5 0.413333 0 1 0.777778 0.430088 0.413333 0.327267
mistral-instruct-v0.2:7:ggufv2:Q3_K_M nan 0.046875 0.368974 0 0.333333 nan 0.773333 0.360411 1 0.466667 0 1 0.666667 0.456024 0.412499 0.335885
mistral-instruct-v0.2:7:ggufv2:Q8_0 nan 0.0375 0.351684 0 0.333333 nan 0.846667 0.366947 0.9 0.433333 0 1 0.644444 0.446719 0.40014 0.330107
llama-2-chat:13:ggufv2:Q8_0 nan 0 0.0762457 0 0 nan 0.786667 0.431373 0.5 0.48 0 1 0.711111 0.362309 0.396841 0.335904
code-llama-instruct:7:ggufv2:Q8_0 nan 0 nan 0 0 nan 0.96 nan 0.5 0.4 0 1 0.666667 0.391852 0.395926 0.37338
code-llama-instruct:7:ggufv2:Q5_K_M nan 0 nan 0 0.111111 nan 0.96 nan 0.5 0.4 0 0.833333 0.688889 0.388148 0.394074 0.340156
llama-2-chat:13:ggufv2:Q3_K_M nan 0 0.112631 0 0 nan 0.68 0.428571 0.5 0.48 0 1 0.733333 0.357685 0.393128 0.325419
llama-2-chat:13:ggufv2:Q5_K_M nan 0 0.0766167 0 0 nan 0.746667 0.431373 0.5 0.433333 0 1 0.644444 0.348403 0.389888 0.32518
code-llama-instruct:7:ggufv2:Q2_K nan 0.0625 nan 0.25 0.25 nan 0.92 nan 0.7 0.533333 0 0.333333 0.8 0.427685 0.380509 0.292686
code-llama-instruct:34:ggufv2:Q4_K_M nan 0 nan 0 0 nan 0.906667 nan 0.4 0.466667 0 0.5 0.975 0.360926 0.380463 0.350483
code-llama-instruct:7:ggufv2:Q6_K nan 0 nan 0 0 nan 0.96 nan 0.9 0.333333 0 0.833333 0.775 0.422407 0.37787 0.391629
code-llama-instruct:34:ggufv2:Q5_K_M nan 0 nan 0 0.125 nan 0.9 nan 1 0.466667 0 0.333333 0.95 0.419444 0.376389 0.384096
llama-2-chat:70:ggufv2:Q2_K nan 0 0.215047 0 0 nan 0.9 nan 0.5 0.473333 0 1 0.666667 0.375505 0.375505 0.352226
llama-2-chat:13:ggufv2:Q6_K nan 0 0.0781337 0 0 nan 0.813333 0.428571 0.5 0.386667 0 1 0.775 0.361973 0.37432 0.342819
code-llama-instruct:34:ggufv2:Q8_0 nan 0 nan 0 0.25 nan 0.86 nan 0.9 0.466667 0 0.333333 0.925 0.415 0.374167 0.353285
llama-2-chat:7:ggufv2:Q4_K_M nan 0 0.0852494 0 0.444444 nan 0.646667 0.40056 0.5 0.24 0 1 0.488889 0.345983 0.373271 0.290686
mistral-instruct-v0.2:7:ggufv2:Q2_K nan 0 0.331261 0 0.222222 nan 0.693333 0.352941 0.5 0.573333 0 1 0.6 0.388463 0.370702 0.294881
mistral-instruct-v0.2:7:ggufv2:Q4_K_M nan 0 0.347025 0 0.333333 nan 0.826667 0.365079 1 0.366667 0 1 0.688889 0.447969 0.365873 0.348328
code-llama-instruct:34:ggufv2:Q6_K nan 0 nan 0 0.125 nan 0.853333 nan 0.9 0.473333 0 0.333333 0.9 0.398333 0.365833 0.356636
llama-2-chat:13:ggufv2:Q4_K_M nan 0 0.0888675 0 0 nan 0.76 0.428571 0.5 0.366667 0 1 0.777778 0.356535 0.361601 0.336686
llama-2-chat:7:ggufv2:Q3_K_M nan 0.1 0.0650717 0 0.333333 nan 0.693333 0.394958 1 0.233333 0 1 0.466667 0.3897 0.361517 0.337204
mixtral-instruct-v0.1:46_7:ggufv2:Q4_K_M nan 0.1625 0.193786 0 0.333333 nan 0.76 0.368814 1 0.426667 0 0.166667 0.755556 0.378848 0.351074 0.301567
llama-2-chat:7:ggufv2:Q8_0 nan 0 0.0847297 0 0.444444 nan 0.64 0.40056 0.5 0.266667 0 1 0.355556 0.335632 0.345594 0.286246
mixtral-instruct-v0.1:46_7:ggufv2:Q5_K_M nan 0 0.235659 0.25 0.422222 nan 0.84 0.352941 1 0.333333 0 0 0.711111 0.376842 0.343137 0.313874
mixtral-instruct-v0.1:46_7:ggufv2:Q6_K nan 0 0.225524 0.25 0.475 nan 0.826667 0.34267 0.7 0.333333 0 0 0.85 0.363927 0.338002 0.289705
mixtral-instruct-v0.1:46_7:ggufv2:Q3_K_M nan 0.065625 0.229622 0.25 0.333333 nan 0.893333 nan 0.5 0.38 0 0 0.777778 0.342969 0.333333 0.278279
code-llama-instruct:13:ggufv2:Q4_K_M nan 0 nan 0 0 nan 0.833333 nan 0.5 0.533333 0 0.333333 0.775 0.330556 0.331944 0.30939
llama-2-chat:7:ggufv2:Q6_K nan 0 0.0614608 0 0.375 nan 0.66 0.406162 0.5 0.266667 0 1 0.333333 0.327511 0.330422 0.288285
mixtral-instruct-v0.1:46_7:ggufv2:Q8_0 nan 0 0.189177 0.25 0.311111 nan 0.846667 0.358543 0.6 0.386667 0 0.133333 0.666667 0.340197 0.325654 0.248216
llama-2-chat:7:ggufv2:Q5_K_M nan 0.0375 0.0697591 0 0.444444 nan 0.633333 0.40056 0.6 0.293333 0 1 0.288889 0.342529 0.317931 0.289372
mixtral-instruct-v0.1:46_7:ggufv2:Q2_K nan 0 0.157514 0 0 nan 0.726667 0.329599 0.6 0.48 0 0.333333 0.733333 0.305495 0.317547 0.269925
llama-2-chat:13:ggufv2:Q2_K nan 0 0.0649389 0 0 nan 0.433333 0.414566 0.5 0.366667 0 1 0.288889 0.278945 0.283917 0.285171
llama-2-chat:7:ggufv2:Q2_K nan 0 0.0361865 0 0 nan 0.686667 0.369748 1 0.1 0 0.833333 0.688889 0.337711 0.218856 0.359055
code-llama-instruct:13:ggufv2:Q2_K nan 0 nan 0 0 nan 0.82 nan 0.4 0.566667 0 0.0333333 0.875 0.299444 0.166389 0.336056
code-llama-instruct:13:ggufv2:Q3_K_M nan 0 nan 0 0.45 nan 0.833333 nan 0 0.533333 0 0 0.85 0.296296 0.148148 0.336707