Skip to content

Benchmark Results - Overview

Here we collect the results of the living BioChatter benchmark. For an explanation, see the benchmarking documentation and the developer docs for further reading.

Scores per model

Table sorted by median score in descending order. Click the column names to reorder.

Model name Size Median Accuracy SD
gpt-3.5-turbo-0125 175 0.87 0.21
gpt-4-turbo-2024-04-09 Unknown 0.83 0.3
gpt-4-0613 Unknown 0.78 0.18
claude-3-opus-20240229 Unknown 0.77 0.28
gpt-3.5-turbo-0613 175 0.76 0.21
claude-3-5-sonnet-20240620 Unknown 0.76 0.28
llama-3.1-instruct 8 0.74 0.28
gpt-4-0125-preview Unknown 0.73 0.3
llama-3.1-instruct 70 0.73 0.29
gpt-4o-2024-05-13 Unknown 0.73 0.35
openhermes-2.5 7 0.7 0.32
gpt-4o-mini-2024-07-18 Unknown 0.7 0.27
llama-3-instruct 8 0.65 0.36
chatglm3 6 0.44 0.26
llama-2-chat 70 0.42 0.34
mistral-instruct-v0.2 7 0.4 0.33
code-llama-instruct 7 0.4 0.35
code-llama-instruct 13 0.38 0.33
code-llama-instruct 34 0.38 0.35
llama-2-chat 13 0.38 0.33
llama-2-chat 7 0.34 0.31
mixtral-instruct-v0.1 46,7 0.34 0.28

Scatter Quantisation Name Boxplot Model

Scores per quantisation

Table sorted by median score in descending order. Click the column names to reorder.

Model name Size Version Quantisation Median Accuracy SD
gpt-3.5-turbo-0125 175 nan nan 0.87 0.21
gpt-4-turbo-2024-04-09 Unknown nan nan 0.83 0.3
gpt-4-0613 Unknown nan nan 0.78 0.18
claude-3-opus-20240229 Unknown nan nan 0.77 0.28
claude-3-5-sonnet-20240620 Unknown nan nan 0.76 0.28
gpt-3.5-turbo-0613 175 nan nan 0.76 0.21
llama-3.1-instruct 8 ggufv2 Q3_K_L 0.75 0.29
llama-3.1-instruct 8 ggufv2 Q8_0 0.75 0.28
llama-3.1-instruct 8 ggufv2 IQ4_XS 0.75 0.3
llama-3.1-instruct 8 ggufv2 Q6_K 0.74 0.28
llama-3.1-instruct 70 ggufv2 IQ2_M 0.74 0.29
llama-3.1-instruct 8 ggufv2 Q5_K_M 0.74 0.28
llama-3.1-instruct 8 ggufv2 Q4_K_M 0.74 0.28
gpt-4-0125-preview Unknown nan nan 0.73 0.3
gpt-4o-2024-05-13 Unknown nan nan 0.73 0.35
llama-3.1-instruct 70 ggufv2 IQ4_XS 0.73 0.29
openhermes-2.5 7 ggufv2 Q5_K_M 0.73 0.32
openhermes-2.5 7 ggufv2 Q8_0 0.71 0.32
openhermes-2.5 7 ggufv2 Q4_K_M 0.71 0.33
gpt-4o-mini-2024-07-18 Unknown nan nan 0.7 0.27
openhermes-2.5 7 ggufv2 Q6_K 0.7 0.33
llama-3.1-instruct 70 ggufv2 Q3_K_S 0.67 0.28
llama-3-instruct 8 ggufv2 Q8_0 0.65 0.35
llama-3-instruct 8 ggufv2 Q4_K_M 0.65 0.38
llama-3-instruct 8 ggufv2 Q6_K 0.65 0.36
llama-3-instruct 8 ggufv2 Q5_K_M 0.62 0.36
openhermes-2.5 7 ggufv2 Q3_K_M 0.59 0.32
openhermes-2.5 7 ggufv2 Q2_K 0.51 0.3
code-llama-instruct 34 ggufv2 Q2_K 0.5 0.33
code-llama-instruct 7 ggufv2 Q3_K_M 0.49 0.31
code-llama-instruct 7 ggufv2 Q4_K_M 0.47 0.39
mistral-instruct-v0.2 7 ggufv2 Q5_K_M 0.46 0.34
mistral-instruct-v0.2 7 ggufv2 Q6_K 0.45 0.34
code-llama-instruct 34 ggufv2 Q3_K_M 0.45 0.31
chatglm3 6 ggmlv3 q4_0 0.44 0.26
llama-2-chat 70 ggufv2 Q4_K_M 0.44 0.35
llama-2-chat 70 ggufv2 Q5_K_M 0.44 0.35
code-llama-instruct 13 ggufv2 Q6_K 0.44 0.35
code-llama-instruct 13 ggufv2 Q8_0 0.44 0.33
code-llama-instruct 13 ggufv2 Q5_K_M 0.43 0.32
llama-2-chat 70 ggufv2 Q3_K_M 0.41 0.33
mistral-instruct-v0.2 7 ggufv2 Q3_K_M 0.41 0.34
mistral-instruct-v0.2 7 ggufv2 Q8_0 0.4 0.33
llama-2-chat 13 ggufv2 Q8_0 0.4 0.34
code-llama-instruct 7 ggufv2 Q8_0 0.4 0.37
code-llama-instruct 7 ggufv2 Q5_K_M 0.39 0.34
llama-2-chat 13 ggufv2 Q3_K_M 0.39 0.33
llama-2-chat 13 ggufv2 Q5_K_M 0.39 0.33
code-llama-instruct 7 ggufv2 Q2_K 0.38 0.29
code-llama-instruct 34 ggufv2 Q4_K_M 0.38 0.35
code-llama-instruct 7 ggufv2 Q6_K 0.38 0.39
code-llama-instruct 34 ggufv2 Q5_K_M 0.38 0.38
llama-2-chat 70 ggufv2 Q2_K 0.38 0.35
llama-2-chat 13 ggufv2 Q6_K 0.37 0.34
code-llama-instruct 34 ggufv2 Q8_0 0.37 0.35
llama-2-chat 7 ggufv2 Q4_K_M 0.37 0.29
mistral-instruct-v0.2 7 ggufv2 Q2_K 0.37 0.29
mistral-instruct-v0.2 7 ggufv2 Q4_K_M 0.37 0.35
code-llama-instruct 34 ggufv2 Q6_K 0.37 0.36
llama-2-chat 13 ggufv2 Q4_K_M 0.36 0.34
llama-2-chat 7 ggufv2 Q3_K_M 0.36 0.34
mixtral-instruct-v0.1 46,7 ggufv2 Q4_K_M 0.35 0.3
llama-2-chat 7 ggufv2 Q8_0 0.35 0.29
mixtral-instruct-v0.1 46,7 ggufv2 Q5_K_M 0.34 0.31
mixtral-instruct-v0.1 46,7 ggufv2 Q6_K 0.34 0.29
mixtral-instruct-v0.1 46,7 ggufv2 Q3_K_M 0.33 0.28
code-llama-instruct 13 ggufv2 Q4_K_M 0.33 0.31
llama-2-chat 7 ggufv2 Q6_K 0.33 0.29
mixtral-instruct-v0.1 46,7 ggufv2 Q8_0 0.33 0.25
llama-2-chat 7 ggufv2 Q5_K_M 0.32 0.29
mixtral-instruct-v0.1 46,7 ggufv2 Q2_K 0.32 0.27
llama-2-chat 13 ggufv2 Q2_K 0.28 0.29
llama-2-chat 7 ggufv2 Q2_K 0.22 0.36
code-llama-instruct 13 ggufv2 Q2_K 0.17 0.34
code-llama-instruct 13 ggufv2 Q3_K_M 0.15 0.34

Boxplot Quantisation

Scores of all tasks

Wide table; you may need to scroll horizontally to see all columns. Table sorted by median score in descending order. Click the column names to reorder.

Full model name property_selection property_exists sourcedata_info_extraction explicit_relevance_of_single_fragments medical_exam naive_query_generation_using_schema query_generation relationship_selection api_calling implicit_relevance_of_multiple_fragments multimodal_answer entity_selection end_to_end_query_generation Mean Accuracy Median Accuracy SD
gpt-3.5-turbo-0125 0.35625 0.866667 0.510032 1 0.670401 0.486667 0.966667 1 0.746479 0.9 nan 1 0.926667 0.785819 0.866667 0.211361
gpt-4-turbo-2024-04-09 0.303125 1 0.650369 1 0.839506 0.5 0.826667 0 nan 1 0.99 1 0.6 0.725806 0.826667 0.301411
gpt-4-0613 0.359375 0.888889 0.668903 1 0.730159 0.68 0.966667 0.65 0.619048 1 nan 0.888889 0.88 0.777661 0.777661 0.177558
claude-3-opus-20240229 0.421875 1 0.691235 0.833333 0.805556 0.733333 0.944444 0 nan 1 nan 1 0.655556 0.73503 0.770293 0.276411
claude-3-5-sonnet-20240620 0.375 0.866667 0.756088 1 0.7737 0.633333 0.966667 0 nan 1 nan 1 0.733333 0.736799 0.764894 0.283502
gpt-3.5-turbo-0613 0.3625 0.755556 0.575381 1 nan 0.5 0.946667 0.5 nan 1 nan 0.888889 0.833333 0.736233 0.755556 0.211926
llama-3.1-instruct:8:ggufv2:Q3_K_L 0.421875 0.833333 0.360379 1 0.768421 0.566667 0.933333 0 nan 0.833333 nan 1 0.733333 0.677334 0.750877 0.285133
llama-3.1-instruct:8:ggufv2:Q8_0 0.515625 0.833333 0.38907 1 0.765734 0.633333 0.933333 0 nan 1 nan 1 0.733333 0.709433 0.749534 0.284792
llama-3.1-instruct:8:ggufv2:IQ4_XS 0.359375 1 0.414621 1 0.756944 0.633333 0.944444 0 nan 0.833333 nan 1 0.733333 0.697762 0.745139 0.295214
llama-3.1-instruct:8:ggufv2:Q6_K 0.46875 0.833333 0.394469 1 0.751748 0.633333 0.955556 0 nan 0.833333 nan 1 0.733333 0.69126 0.742541 0.278015
llama-3.1-instruct:70:ggufv2:IQ2_M 0.328125 0.916667 0.626498 1 0.772881 0.633333 0.955556 0 nan 1 nan 1 0.6 0.712096 0.742489 0.293676
llama-3.1-instruct:8:ggufv2:Q5_K_M 0.4375 0.833333 0.380477 1 0.749117 0.7 0.933333 0 nan 0.833333 nan 1 0.733333 0.690948 0.741225 0.279278
llama-3.1-instruct:8:ggufv2:Q4_K_M 0.453125 0.75 0.382027 1 0.741935 0.633333 0.933333 0 nan 0.833333 nan 1 0.733333 0.67822 0.737634 0.275701
gpt-4-0125-preview 0 0.733333 0.689705 1 0.77591 0.44 0.833333 0.75 0.793651 0.5 nan 0.777778 0 0.607809 0.733333 0.295129
gpt-4o-2024-05-13 0 0.85 0.653946 1 0.762838 0.533333 0.8 0 0.809524 0.7 0.96 1 0 0.620742 0.731419 0.351091
llama-3.1-instruct:70:ggufv2:IQ4_XS 0.375 0.75 0.699238 1 0.822581 0.566667 0.955556 0 nan 1 nan 1 0.6 0.706276 0.728138 0.285226
openhermes-2.5:7:ggufv2:Q5_K_M 0.125 0.777778 0.579916 1 0.571429 0.586667 0.913333 1 nan 1 nan 0.888889 0 0.676637 0.727208 0.318593
openhermes-2.5:7:ggufv2:Q8_0 0.125 0.755556 0.600829 1 0.577031 0.466667 0.88 1 nan 1 nan 0.888889 0 0.663088 0.709322 0.319919
openhermes-2.5:7:ggufv2:Q4_K_M 0.046875 0.755556 0.597281 1 0.586368 0.466667 0.873333 1 nan 1 nan 0.888889 0 0.655906 0.705731 0.330932
gpt-4o-mini-2024-07-18 0.365625 0.925 0.684553 0.833333 0.840498 0.533333 0.966667 0 0.714286 0.5 0.98 1 0.66 0.692561 0.703423 0.267084
openhermes-2.5:7:ggufv2:Q6_K 0.046875 0.733333 0.619167 1 0.57423 0.533333 0.86 1 nan 1 nan 1 0 0.669722 0.701528 0.334697
llama-3.1-instruct:70:ggufv2:Q3_K_S 0.375 0.625 0.642336 1 0.8 0.633333 0.966667 0 nan 1 nan 1 0.6 0.694758 0.668547 0.28438
llama-3-instruct:8:ggufv2:Q8_0 0.28125 0.725 0.188555 1 0.640669 0.666667 0.92 0 nan 1 nan 0.875 0 0.572467 0.653668 0.35454
llama-3-instruct:8:ggufv2:Q4_K_M 0.109375 0.775 0.116871 1 0.624884 0.666667 0.92 0 nan 1 nan 0.861111 0 0.552173 0.645775 0.376754
llama-3-instruct:8:ggufv2:Q6_K 0.28125 0.775 0.162657 1 0.623955 0.666667 0.926667 0 nan 1 nan 0.875 0 0.573745 0.645311 0.359165
llama-3-instruct:8:ggufv2:Q5_K_M 0.1875 0.65 0.166434 1 0.635097 0.6 0.926667 0 nan 1 nan 0.875 0 0.549154 0.617549 0.360565
openhermes-2.5:7:ggufv2:Q3_K_M 0.125 0.72 0.554488 1 0.563959 0.466667 0.94 1 nan 0.5 nan 1 0 0.624556 0.594257 0.318981
openhermes-2.5:7:ggufv2:Q2_K 0 0.844444 0.444054 1 0.537815 0.433333 0.94 0.5 nan 0.5 nan 0.555556 0 0.5232 0.5116 0.298404
code-llama-instruct:34:ggufv2:Q2_K 0 0.75 nan 0.5 nan 0.566667 0.686667 0.5 nan 1 nan 0 0 0.444815 0.5 0.328199
code-llama-instruct:7:ggufv2:Q3_K_M 0 0.8 nan 0.833333 nan 0.426667 0.873333 0.25 nan 0.7 nan 0.5 0 0.487037 0.493519 0.307716
code-llama-instruct:7:ggufv2:Q4_K_M 0 0.6 0.138732 1 nan 0.653333 0.966667 0 nan 1 nan 0.333333 0 0.469207 0.469207 0.38731
mistral-instruct-v0.2:7:ggufv2:Q5_K_M 0 0.688889 0.385754 1 0.364146 0.466667 0.826667 0 nan 1 nan 0.444444 0 0.470597 0.455556 0.34385
mistral-instruct-v0.2:7:ggufv2:Q6_K 0.046875 0.65 0.367412 1 0.366947 0.433333 0.833333 0 nan 1 nan 0.5 0 0.472536 0.452935 0.337974
code-llama-instruct:34:ggufv2:Q3_K_M 0 0.875 nan 0.5 nan 0.6 0.786667 0.25 nan 0.5 nan 0 0 0.390185 0.445093 0.306514
chatglm3:6:ggmlv3:q4_0 0.2875 0.275 0.188284 0.733333 0.426704 0.48 0.553333 0.4 nan 1 nan 0.75 0 0.463105 0.444905 0.260423
llama-2-chat:70:ggufv2:Q4_K_M 0 0.755556 0.240936 1 nan 0.42 0.92 0.25 nan 1 nan 0.444444 0 0.503094 0.444444 0.354692
llama-2-chat:70:ggufv2:Q5_K_M 0 0.777778 0.210166 1 nan 0.36 0.906667 0.25 nan 0.9 nan 0.444444 0 0.484905 0.444444 0.346535
code-llama-instruct:13:ggufv2:Q6_K 0 0.825 nan 0.833333 nan 0.54 0.793333 0 nan 0.5 nan 0 0 0.387963 0.443981 0.345581
code-llama-instruct:13:ggufv2:Q8_0 0 0.75 nan 0.833333 nan 0.566667 0.766667 0 nan 0.5 nan 0 0 0.37963 0.439815 0.334971
code-llama-instruct:13:ggufv2:Q5_K_M 0 0.775 nan 0.666667 nan 0.566667 0.78 0 nan 0.5 nan 0 0 0.36537 0.432685 0.320506
llama-2-chat:70:ggufv2:Q3_K_M 0.171875 0.777778 0.197898 1 nan 0.413333 0.906667 0 nan 0.5 nan 0.333333 0 0.430088 0.413333 0.327267
mistral-instruct-v0.2:7:ggufv2:Q3_K_M 0.046875 0.666667 0.368974 1 0.360411 0.466667 0.773333 0 nan 1 nan 0.333333 0 0.456024 0.412499 0.335885
mistral-instruct-v0.2:7:ggufv2:Q8_0 0.0375 0.644444 0.351684 1 0.366947 0.433333 0.846667 0 nan 0.9 nan 0.333333 0 0.446719 0.40014 0.330107
llama-2-chat:13:ggufv2:Q8_0 0 0.711111 0.0762457 1 0.431373 0.48 0.786667 0 nan 0.5 nan 0 0 0.362309 0.396841 0.335904
code-llama-instruct:7:ggufv2:Q8_0 0 0.666667 nan 1 nan 0.4 0.96 0 nan 0.5 nan 0 0 0.391852 0.395926 0.37338
code-llama-instruct:7:ggufv2:Q5_K_M 0 0.688889 nan 0.833333 nan 0.4 0.96 0 nan 0.5 nan 0.111111 0 0.388148 0.394074 0.340156
llama-2-chat:13:ggufv2:Q3_K_M 0 0.733333 0.112631 1 0.428571 0.48 0.68 0 nan 0.5 nan 0 0 0.357685 0.393128 0.325419
llama-2-chat:13:ggufv2:Q5_K_M 0 0.644444 0.0766167 1 0.431373 0.433333 0.746667 0 nan 0.5 nan 0 0 0.348403 0.389888 0.32518
code-llama-instruct:7:ggufv2:Q2_K 0.0625 0.8 nan 0.333333 nan 0.533333 0.92 0.25 nan 0.7 nan 0.25 0 0.427685 0.380509 0.292686
code-llama-instruct:34:ggufv2:Q4_K_M 0 0.975 nan 0.5 nan 0.466667 0.906667 0 nan 0.4 nan 0 0 0.360926 0.380463 0.350483
code-llama-instruct:7:ggufv2:Q6_K 0 0.775 nan 0.833333 nan 0.333333 0.96 0 nan 0.9 nan 0 0 0.422407 0.37787 0.391629
code-llama-instruct:34:ggufv2:Q5_K_M 0 0.95 nan 0.333333 nan 0.466667 0.9 0 nan 1 nan 0.125 0 0.419444 0.376389 0.384096
llama-2-chat:70:ggufv2:Q2_K 0 0.666667 0.215047 1 nan 0.473333 0.9 0 nan 0.5 nan 0 0 0.375505 0.375505 0.352226
llama-2-chat:13:ggufv2:Q6_K 0 0.775 0.0781337 1 0.428571 0.386667 0.813333 0 nan 0.5 nan 0 0 0.361973 0.37432 0.342819
code-llama-instruct:34:ggufv2:Q8_0 0 0.925 nan 0.333333 nan 0.466667 0.86 0 nan 0.9 nan 0.25 0 0.415 0.374167 0.353285
llama-2-chat:7:ggufv2:Q4_K_M 0 0.488889 0.0852494 1 0.40056 0.24 0.646667 0 nan 0.5 nan 0.444444 0 0.345983 0.373271 0.290686
mistral-instruct-v0.2:7:ggufv2:Q2_K 0 0.6 0.331261 1 0.352941 0.573333 0.693333 0 nan 0.5 nan 0.222222 0 0.388463 0.370702 0.294881
mistral-instruct-v0.2:7:ggufv2:Q4_K_M 0 0.688889 0.347025 1 0.365079 0.366667 0.826667 0 nan 1 nan 0.333333 0 0.447969 0.365873 0.348328
code-llama-instruct:34:ggufv2:Q6_K 0 0.9 nan 0.333333 nan 0.473333 0.853333 0 nan 0.9 nan 0.125 0 0.398333 0.365833 0.356636
llama-2-chat:13:ggufv2:Q4_K_M 0 0.777778 0.0888675 1 0.428571 0.366667 0.76 0 nan 0.5 nan 0 0 0.356535 0.361601 0.336686
llama-2-chat:7:ggufv2:Q3_K_M 0.1 0.466667 0.0650717 1 0.394958 0.233333 0.693333 0 nan 1 nan 0.333333 0 0.3897 0.361517 0.337204
mixtral-instruct-v0.1:46_7:ggufv2:Q4_K_M 0.1625 0.755556 0.193786 0.166667 0.368814 0.426667 0.76 0 nan 1 nan 0.333333 0 0.378848 0.351074 0.301567
llama-2-chat:7:ggufv2:Q8_0 0 0.355556 0.0847297 1 0.40056 0.266667 0.64 0 nan 0.5 nan 0.444444 0 0.335632 0.345594 0.286246
mixtral-instruct-v0.1:46_7:ggufv2:Q5_K_M 0 0.711111 0.235659 0 0.352941 0.333333 0.84 0.25 nan 1 nan 0.422222 0 0.376842 0.343137 0.313874
mixtral-instruct-v0.1:46_7:ggufv2:Q6_K 0 0.85 0.225524 0 0.34267 0.333333 0.826667 0.25 nan 0.7 nan 0.475 0 0.363927 0.338002 0.289705
mixtral-instruct-v0.1:46_7:ggufv2:Q3_K_M 0.065625 0.777778 0.229622 0 nan 0.38 0.893333 0.25 nan 0.5 nan 0.333333 0 0.342969 0.333333 0.278279
code-llama-instruct:13:ggufv2:Q4_K_M 0 0.775 nan 0.333333 nan 0.533333 0.833333 0 nan 0.5 nan 0 0 0.330556 0.331944 0.30939
llama-2-chat:7:ggufv2:Q6_K 0 0.333333 0.0614608 1 0.406162 0.266667 0.66 0 nan 0.5 nan 0.375 0 0.327511 0.330422 0.288285
mixtral-instruct-v0.1:46_7:ggufv2:Q8_0 0 0.666667 0.189177 0.133333 0.358543 0.386667 0.846667 0.25 nan 0.6 nan 0.311111 0 0.340197 0.325654 0.248216
llama-2-chat:7:ggufv2:Q5_K_M 0.0375 0.288889 0.0697591 1 0.40056 0.293333 0.633333 0 nan 0.6 nan 0.444444 0 0.342529 0.317931 0.289372
mixtral-instruct-v0.1:46_7:ggufv2:Q2_K 0 0.733333 0.157514 0.333333 0.329599 0.48 0.726667 0 nan 0.6 nan 0 0 0.305495 0.317547 0.269925
llama-2-chat:13:ggufv2:Q2_K 0 0.288889 0.0649389 1 0.414566 0.366667 0.433333 0 nan 0.5 nan 0 0 0.278945 0.283917 0.285171
llama-2-chat:7:ggufv2:Q2_K 0 0.688889 0.0361865 0.833333 0.369748 0.1 0.686667 0 nan 1 nan 0 0 0.337711 0.218856 0.359055
code-llama-instruct:13:ggufv2:Q2_K 0 0.875 nan 0.0333333 nan 0.566667 0.82 0 nan 0.4 nan 0 0 0.299444 0.166389 0.336056
code-llama-instruct:13:ggufv2:Q3_K_M 0 0.85 nan 0 nan 0.533333 0.833333 0 nan 0 nan 0.45 0 0.296296 0.148148 0.336707