Benchmark Results - Overview
Here we collect the results of the living BioChatter benchmark. For an explanation, see the benchmarking documentation and the developer docs for further reading.
Scores per model
Table sorted by mean score in descending order. Click the column names to reorder.
Model name | Size | Median Accuracy | SD |
---|---|---|---|
gpt-3.5-turbo-0125 | 175 | 0.9 | 0.23 |
gpt-4-0613 | Unknown | 0.88 | 0.19 |
gpt-3.5-turbo-0613 | 175 | 0.76 | 0.21 |
openhermes-2.5 | 7 | 0.74 | 0.33 |
gpt-4-0125-preview | Unknown | 0.69 | 0.31 |
llama-3-instruct | 8 | 0.67 | 0.38 |
gpt-4o-2024-05-13 | Unknown | 0.65 | 0.37 |
chatglm3 | 6 | 0.47 | 0.27 |
mistral-instruct-v0.2 | 7 | 0.45 | 0.34 |
llama-2-chat | 70 | 0.42 | 0.34 |
code-llama-instruct | 7 | 0.4 | 0.35 |
code-llama-instruct | 13 | 0.38 | 0.33 |
code-llama-instruct | 34 | 0.38 | 0.35 |
llama-2-chat | 13 | 0.35 | 0.34 |
llama-2-chat | 7 | 0.34 | 0.32 |
mixtral-instruct-v0.1 | 46,7 | 0.33 | 0.3 |
Scores per quantisation
Table sorted by mean score in descending order. Click the column names to reorder.
Model name | Size | Version | Quantisation | Median Accuracy | SD |
---|---|---|---|---|---|
gpt-3.5-turbo-0125 | 175 | nan | nan | 0.9 | 0.23 |
gpt-4-0613 | Unknown | nan | nan | 0.88 | 0.19 |
openhermes-2.5 | 7 | ggufv2 | Q5_K_M | 0.78 | 0.33 |
openhermes-2.5 | 7 | ggufv2 | Q4_K_M | 0.76 | 0.35 |
gpt-3.5-turbo-0613 | 175 | nan | nan | 0.76 | 0.21 |
openhermes-2.5 | 7 | ggufv2 | Q8_0 | 0.76 | 0.33 |
openhermes-2.5 | 7 | ggufv2 | Q6_K | 0.73 | 0.35 |
gpt-4-0125-preview | Unknown | nan | nan | 0.69 | 0.31 |
llama-3-instruct | 8 | ggufv2 | Q4_K_M | 0.67 | 0.39 |
llama-3-instruct | 8 | ggufv2 | Q8_0 | 0.67 | 0.37 |
llama-3-instruct | 8 | ggufv2 | Q6_K | 0.67 | 0.38 |
gpt-4o-2024-05-13 | Unknown | nan | nan | 0.65 | 0.37 |
openhermes-2.5 | 7 | ggufv2 | Q3_K_M | 0.63 | 0.33 |
llama-3-instruct | 8 | ggufv2 | Q5_K_M | 0.6 | 0.38 |
openhermes-2.5 | 7 | ggufv2 | Q2_K | 0.5 | 0.31 |
code-llama-instruct | 34 | ggufv2 | Q2_K | 0.5 | 0.33 |
code-llama-instruct | 7 | ggufv2 | Q3_K_M | 0.49 | 0.31 |
mistral-instruct-v0.2 | 7 | ggufv2 | Q6_K | 0.48 | 0.35 |
code-llama-instruct | 7 | ggufv2 | Q4_K_M | 0.47 | 0.39 |
chatglm3 | 6 | ggmlv3 | q4_0 | 0.47 | 0.27 |
mistral-instruct-v0.2 | 7 | ggufv2 | Q5_K_M | 0.47 | 0.36 |
mistral-instruct-v0.2 | 7 | ggufv2 | Q3_K_M | 0.47 | 0.35 |
code-llama-instruct | 34 | ggufv2 | Q3_K_M | 0.45 | 0.31 |
llama-2-chat | 70 | ggufv2 | Q4_K_M | 0.44 | 0.35 |
llama-2-chat | 70 | ggufv2 | Q5_K_M | 0.44 | 0.35 |
code-llama-instruct | 13 | ggufv2 | Q6_K | 0.44 | 0.35 |
code-llama-instruct | 13 | ggufv2 | Q8_0 | 0.44 | 0.33 |
mistral-instruct-v0.2 | 7 | ggufv2 | Q8_0 | 0.43 | 0.34 |
code-llama-instruct | 13 | ggufv2 | Q5_K_M | 0.43 | 0.32 |
llama-2-chat | 70 | ggufv2 | Q3_K_M | 0.41 | 0.33 |
code-llama-instruct | 7 | ggufv2 | Q8_0 | 0.4 | 0.37 |
code-llama-instruct | 7 | ggufv2 | Q5_K_M | 0.39 | 0.34 |
mistral-instruct-v0.2 | 7 | ggufv2 | Q2_K | 0.39 | 0.31 |
llama-2-chat | 13 | ggufv2 | Q6_K | 0.39 | 0.36 |
code-llama-instruct | 7 | ggufv2 | Q2_K | 0.38 | 0.29 |
code-llama-instruct | 34 | ggufv2 | Q4_K_M | 0.38 | 0.35 |
code-llama-instruct | 7 | ggufv2 | Q6_K | 0.38 | 0.39 |
code-llama-instruct | 34 | ggufv2 | Q5_K_M | 0.38 | 0.38 |
llama-2-chat | 70 | ggufv2 | Q2_K | 0.38 | 0.35 |
code-llama-instruct | 34 | ggufv2 | Q8_0 | 0.37 | 0.35 |
mistral-instruct-v0.2 | 7 | ggufv2 | Q4_K_M | 0.37 | 0.36 |
code-llama-instruct | 34 | ggufv2 | Q6_K | 0.37 | 0.36 |
llama-2-chat | 7 | ggufv2 | Q8_0 | 0.36 | 0.3 |
llama-2-chat | 13 | ggufv2 | Q8_0 | 0.36 | 0.35 |
llama-2-chat | 13 | ggufv2 | Q3_K_M | 0.35 | 0.34 |
llama-2-chat | 13 | ggufv2 | Q4_K_M | 0.35 | 0.35 |
llama-2-chat | 7 | ggufv2 | Q6_K | 0.34 | 0.3 |
llama-2-chat | 7 | ggufv2 | Q4_K_M | 0.34 | 0.3 |
llama-2-chat | 13 | ggufv2 | Q5_K_M | 0.34 | 0.34 |
mixtral-instruct-v0.1 | 46,7 | ggufv2 | Q3_K_M | 0.33 | 0.28 |
mixtral-instruct-v0.1 | 46,7 | ggufv2 | Q4_K_M | 0.33 | 0.32 |
mixtral-instruct-v0.1 | 46,7 | ggufv2 | Q6_K | 0.33 | 0.3 |
mixtral-instruct-v0.1 | 46,7 | ggufv2 | Q5_K_M | 0.33 | 0.33 |
llama-2-chat | 7 | ggufv2 | Q3_K_M | 0.33 | 0.35 |
code-llama-instruct | 13 | ggufv2 | Q4_K_M | 0.33 | 0.31 |
mixtral-instruct-v0.1 | 46,7 | ggufv2 | Q8_0 | 0.31 | 0.26 |
mixtral-instruct-v0.1 | 46,7 | ggufv2 | Q2_K | 0.3 | 0.28 |
llama-2-chat | 7 | ggufv2 | Q5_K_M | 0.29 | 0.3 |
llama-2-chat | 13 | ggufv2 | Q2_K | 0.27 | 0.29 |
code-llama-instruct | 13 | ggufv2 | Q2_K | 0.17 | 0.34 |
code-llama-instruct | 13 | ggufv2 | Q3_K_M | 0.15 | 0.34 |
llama-2-chat | 7 | ggufv2 | Q2_K | 0.1 | 0.38 |
Scores of all tasks
Wide table; you may need to scroll horizontally to see all columns. Table sorted by mean score in descending order. Click the column names to reorder.
Full model name | naive_query_generation_using_schema | entity_selection | end_to_end_query_generation | property_exists | explicit_relevance_of_single_fragments | property_selection | sourcedata_info_extraction | implicit_relevance_of_multiple_fragments | query_generation | relationship_selection | Mean Accuracy | Median Accuracy | SD |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
gpt-3.5-turbo-0125 | 0.486667 | 1 | 0.926667 | 0.866667 | 1 | 0.35625 | 0.510032 | 0.9 | 0.966667 | 1 | 0.801295 | 0.9 | 0.226907 |
gpt-4-0613 | 0.68 | 0.888889 | 0.88 | 0.888889 | 1 | 0.359375 | 0.668903 | 1 | 0.966667 | 0.65 | 0.798272 | 0.88 | 0.186915 |
openhermes-2.5:7:ggufv2:Q5_K_M | 0.586667 | 0.888889 | 0 | 0.777778 | 1 | 0.125 | 0.579916 | 1 | 0.913333 | 1 | 0.687158 | 0.777778 | 0.331801 |
openhermes-2.5:7:ggufv2:Q4_K_M | 0.466667 | 0.888889 | 0 | 0.755556 | 1 | 0.046875 | 0.597281 | 1 | 0.873333 | 1 | 0.66286 | 0.755556 | 0.345683 |
gpt-3.5-turbo-0613 | 0.5 | 0.888889 | 0.833333 | 0.755556 | 1 | 0.3625 | 0.575381 | 1 | 0.946667 | 0.5 | 0.736233 | 0.755556 | 0.211926 |
openhermes-2.5:7:ggufv2:Q8_0 | 0.466667 | 0.888889 | 0 | 0.755556 | 1 | 0.125 | 0.600829 | 1 | 0.88 | 1 | 0.671694 | 0.755556 | 0.333644 |
openhermes-2.5:7:ggufv2:Q6_K | 0.533333 | 1 | 0 | 0.733333 | 1 | 0.046875 | 0.619167 | 1 | 0.86 | 1 | 0.679271 | 0.733333 | 0.3485 |
gpt-4-0125-preview | 0.44 | 0.777778 | 0 | 0.733333 | 1 | 0 | 0.689705 | 0.5 | 0.833333 | 0.75 | 0.572415 | 0.689705 | 0.309688 |
llama-3-instruct:8:ggufv2:Q4_K_M | 0.666667 | 0.875 | 0 | 0.775 | 1 | 0.109375 | 0.116871 | 1 | 0.92 | 0 | 0.546291 | 0.666667 | 0.394468 |
llama-3-instruct:8:ggufv2:Q8_0 | 0.666667 | 0.875 | 0 | 0.725 | 1 | 0.28125 | 0.188555 | 1 | 0.92 | 0 | 0.565647 | 0.666667 | 0.370078 |
llama-3-instruct:8:ggufv2:Q6_K | 0.666667 | 0.875 | 0 | 0.775 | 1 | 0.28125 | 0.162657 | 1 | 0.926667 | 0 | 0.568724 | 0.666667 | 0.375292 |
gpt-4o-2024-05-13 | 0.533333 | 1 | 0 | 0.85 | 1 | 0 | 0.653946 | 0.7 | 0.8 | 0 | 0.553728 | 0.653946 | 0.370215 |
openhermes-2.5:7:ggufv2:Q3_K_M | 0.466667 | 1 | 0 | 0.72 | 1 | 0.125 | 0.554488 | 0.5 | 0.94 | 1 | 0.630615 | 0.630615 | 0.332498 |
llama-3-instruct:8:ggufv2:Q5_K_M | 0.6 | 0.875 | 0 | 0.65 | 1 | 0.1875 | 0.166434 | 1 | 0.926667 | 0 | 0.54056 | 0.6 | 0.375485 |
openhermes-2.5:7:ggufv2:Q2_K | 0.433333 | 0.555556 | 0 | 0.844444 | 1 | 0 | 0.444054 | 0.5 | 0.94 | 0.5 | 0.521739 | 0.5 | 0.311683 |
code-llama-instruct:34:ggufv2:Q2_K | 0.566667 | 0 | 0 | 0.75 | 0.5 | 0 | nan | 1 | 0.686667 | 0.5 | 0.444815 | 0.5 | 0.328199 |
code-llama-instruct:7:ggufv2:Q3_K_M | 0.426667 | 0.5 | 0 | 0.8 | 0.833333 | 0 | nan | 0.7 | 0.873333 | 0.25 | 0.487037 | 0.493519 | 0.307716 |
mistral-instruct-v0.2:7:ggufv2:Q6_K | 0.433333 | 0.5 | 0 | 0.65 | 1 | 0.046875 | 0.367412 | 1 | 0.833333 | 0 | 0.483095 | 0.483095 | 0.351374 |
code-llama-instruct:7:ggufv2:Q4_K_M | 0.653333 | 0.333333 | 0 | 0.6 | 1 | 0 | 0.138732 | 1 | 0.966667 | 0 | 0.469207 | 0.469207 | 0.38731 |
chatglm3:6:ggmlv3:q4_0 | 0.48 | 0.75 | 0 | 0.275 | 0.733333 | 0.2875 | 0.188284 | 1 | 0.553333 | 0.4 | 0.466745 | 0.466745 | 0.271708 |
mistral-instruct-v0.2:7:ggufv2:Q5_K_M | 0.466667 | 0.444444 | 0 | 0.688889 | 1 | 0 | 0.385754 | 1 | 0.826667 | 0 | 0.481242 | 0.466667 | 0.357557 |
mistral-instruct-v0.2:7:ggufv2:Q3_K_M | 0.466667 | 0.333333 | 0 | 0.666667 | 1 | 0.046875 | 0.368974 | 1 | 0.773333 | 0 | 0.465585 | 0.465585 | 0.349288 |
code-llama-instruct:34:ggufv2:Q3_K_M | 0.6 | 0 | 0 | 0.875 | 0.5 | 0 | nan | 0.5 | 0.786667 | 0.25 | 0.390185 | 0.445093 | 0.306514 |
llama-2-chat:70:ggufv2:Q4_K_M | 0.42 | 0.444444 | 0 | 0.755556 | 1 | 0 | 0.240936 | 1 | 0.92 | 0.25 | 0.503094 | 0.444444 | 0.354692 |
llama-2-chat:70:ggufv2:Q5_K_M | 0.36 | 0.444444 | 0 | 0.777778 | 1 | 0 | 0.210166 | 0.9 | 0.906667 | 0.25 | 0.484905 | 0.444444 | 0.346535 |
code-llama-instruct:13:ggufv2:Q6_K | 0.54 | 0 | 0 | 0.825 | 0.833333 | 0 | nan | 0.5 | 0.793333 | 0 | 0.387963 | 0.443981 | 0.345581 |
code-llama-instruct:13:ggufv2:Q8_0 | 0.566667 | 0 | 0 | 0.75 | 0.833333 | 0 | nan | 0.5 | 0.766667 | 0 | 0.37963 | 0.439815 | 0.334971 |
mistral-instruct-v0.2:7:ggufv2:Q8_0 | 0.433333 | 0.333333 | 0 | 0.644444 | 1 | 0.0375 | 0.351684 | 0.9 | 0.846667 | 0 | 0.454696 | 0.433333 | 0.343652 |
code-llama-instruct:13:ggufv2:Q5_K_M | 0.566667 | 0 | 0 | 0.775 | 0.666667 | 0 | nan | 0.5 | 0.78 | 0 | 0.36537 | 0.432685 | 0.320506 |
llama-2-chat:70:ggufv2:Q3_K_M | 0.413333 | 0.333333 | 0 | 0.777778 | 1 | 0.171875 | 0.197898 | 0.5 | 0.906667 | 0 | 0.430088 | 0.413333 | 0.327267 |
code-llama-instruct:7:ggufv2:Q8_0 | 0.4 | 0 | 0 | 0.666667 | 1 | 0 | nan | 0.5 | 0.96 | 0 | 0.391852 | 0.395926 | 0.37338 |
code-llama-instruct:7:ggufv2:Q5_K_M | 0.4 | 0.111111 | 0 | 0.688889 | 0.833333 | 0 | nan | 0.5 | 0.96 | 0 | 0.388148 | 0.394074 | 0.340156 |
mistral-instruct-v0.2:7:ggufv2:Q2_K | 0.573333 | 0.222222 | 0 | 0.6 | 1 | 0 | 0.331261 | 0.5 | 0.693333 | 0 | 0.392015 | 0.392015 | 0.307746 |
llama-2-chat:13:ggufv2:Q6_K | 0.386667 | 0 | 0 | 0.775 | 1 | 0 | nan | 0.5 | 0.813333 | 0 | 0.386111 | 0.386389 | 0.363306 |
code-llama-instruct:7:ggufv2:Q2_K | 0.533333 | 0.25 | 0 | 0.8 | 0.333333 | 0.0625 | nan | 0.7 | 0.92 | 0.25 | 0.427685 | 0.380509 | 0.292686 |
code-llama-instruct:34:ggufv2:Q4_K_M | 0.466667 | 0 | 0 | 0.975 | 0.5 | 0 | nan | 0.4 | 0.906667 | 0 | 0.360926 | 0.380463 | 0.350483 |
code-llama-instruct:7:ggufv2:Q6_K | 0.333333 | 0 | 0 | 0.775 | 0.833333 | 0 | nan | 0.9 | 0.96 | 0 | 0.422407 | 0.37787 | 0.391629 |
code-llama-instruct:34:ggufv2:Q5_K_M | 0.466667 | 0.125 | 0 | 0.95 | 0.333333 | 0 | nan | 1 | 0.9 | 0 | 0.419444 | 0.376389 | 0.384096 |
llama-2-chat:70:ggufv2:Q2_K | 0.473333 | 0 | 0 | 0.666667 | 1 | 0 | 0.215047 | 0.5 | 0.9 | 0 | 0.375505 | 0.375505 | 0.352226 |
code-llama-instruct:34:ggufv2:Q8_0 | 0.466667 | 0.25 | 0 | 0.925 | 0.333333 | 0 | nan | 0.9 | 0.86 | 0 | 0.415 | 0.374167 | 0.353285 |
mistral-instruct-v0.2:7:ggufv2:Q4_K_M | 0.366667 | 0.333333 | 0 | 0.688889 | 1 | 0 | 0.347025 | 1 | 0.826667 | 0 | 0.456258 | 0.366667 | 0.363014 |
code-llama-instruct:34:ggufv2:Q6_K | 0.473333 | 0.125 | 0 | 0.9 | 0.333333 | 0 | nan | 0.9 | 0.853333 | 0 | 0.398333 | 0.365833 | 0.356636 |
llama-2-chat:7:ggufv2:Q8_0 | 0.266667 | 0.444444 | 0 | 0.355556 | 1 | 0 | nan | 0.5 | 0.64 | 0 | 0.356296 | 0.355926 | 0.302016 |
llama-2-chat:13:ggufv2:Q8_0 | 0.48 | 0 | 0 | 0.711111 | 1 | 0 | 0.0762457 | 0.5 | 0.786667 | 0 | 0.355402 | 0.355402 | 0.350017 |
llama-2-chat:13:ggufv2:Q3_K_M | 0.48 | 0 | 0 | 0.733333 | 1 | 0 | 0.112631 | 0.5 | 0.68 | 0 | 0.350596 | 0.350596 | 0.338994 |
llama-2-chat:13:ggufv2:Q4_K_M | 0.366667 | 0 | 0 | 0.777778 | 1 | 0 | 0.0888675 | 0.5 | 0.76 | 0 | 0.349331 | 0.349331 | 0.350915 |
llama-2-chat:7:ggufv2:Q6_K | 0.266667 | 0.375 | 0 | 0.333333 | 1 | 0 | nan | 0.5 | 0.66 | 0 | 0.348333 | 0.340833 | 0.302733 |
llama-2-chat:7:ggufv2:Q4_K_M | 0.24 | 0.444444 | 0 | 0.488889 | 1 | 0 | 0.0852494 | 0.5 | 0.646667 | 0 | 0.340525 | 0.340525 | 0.303018 |
llama-2-chat:13:ggufv2:Q5_K_M | 0.433333 | 0 | 0 | 0.644444 | 1 | 0 | 0.0766167 | 0.5 | 0.746667 | 0 | 0.340106 | 0.340106 | 0.338412 |
mixtral-instruct-v0.1:46_7:ggufv2:Q3_K_M | 0.38 | 0.333333 | 0 | 0.777778 | 0 | 0.065625 | 0.229622 | 0.5 | 0.893333 | 0.25 | 0.342969 | 0.333333 | 0.278279 |
mixtral-instruct-v0.1:46_7:ggufv2:Q4_K_M | 0.426667 | 0.333333 | 0 | 0.755556 | 0.166667 | 0.1625 | 0.193786 | 1 | 0.76 | 0 | 0.379851 | 0.333333 | 0.315144 |
mixtral-instruct-v0.1:46_7:ggufv2:Q6_K | 0.333333 | 0.475 | 0 | 0.85 | 0 | 0 | 0.225524 | 0.7 | 0.826667 | 0.25 | 0.366052 | 0.333333 | 0.302567 |
mixtral-instruct-v0.1:46_7:ggufv2:Q5_K_M | 0.333333 | 0.422222 | 0 | 0.711111 | 0 | 0 | 0.235659 | 1 | 0.84 | 0.25 | 0.379233 | 0.333333 | 0.327866 |
llama-2-chat:7:ggufv2:Q3_K_M | 0.233333 | 0.333333 | 0 | 0.466667 | 1 | 0.1 | 0.0650717 | 1 | 0.693333 | 0 | 0.389174 | 0.333333 | 0.352468 |
code-llama-instruct:13:ggufv2:Q4_K_M | 0.533333 | 0 | 0 | 0.775 | 0.333333 | 0 | nan | 0.5 | 0.833333 | 0 | 0.330556 | 0.331944 | 0.30939 |
mixtral-instruct-v0.1:46_7:ggufv2:Q8_0 | 0.386667 | 0.311111 | 0 | 0.666667 | 0.133333 | 0 | 0.189177 | 0.6 | 0.846667 | 0.25 | 0.338362 | 0.311111 | 0.259273 |
mixtral-instruct-v0.1:46_7:ggufv2:Q2_K | 0.48 | 0 | 0 | 0.733333 | 0.333333 | 0 | 0.157514 | 0.6 | 0.726667 | 0 | 0.303085 | 0.303085 | 0.281803 |
llama-2-chat:7:ggufv2:Q5_K_M | 0.293333 | 0.444444 | 0 | 0.288889 | 1 | 0.0375 | 0.0697591 | 0.6 | 0.633333 | 0 | 0.336726 | 0.293333 | 0.301858 |
llama-2-chat:13:ggufv2:Q2_K | 0.366667 | 0 | 0 | 0.288889 | 1 | 0 | 0.0649389 | 0.5 | 0.433333 | 0 | 0.265383 | 0.265383 | 0.294744 |
code-llama-instruct:13:ggufv2:Q2_K | 0.566667 | 0 | 0 | 0.875 | 0.0333333 | 0 | nan | 0.4 | 0.82 | 0 | 0.299444 | 0.166389 | 0.336056 |
code-llama-instruct:13:ggufv2:Q3_K_M | 0.533333 | 0.45 | 0 | 0.85 | 0 | 0 | nan | 0 | 0.833333 | 0 | 0.296296 | 0.148148 | 0.336707 |
llama-2-chat:7:ggufv2:Q2_K | 0.1 | 0 | 0 | 0.688889 | 0.833333 | 0 | 0.0361865 | 1 | 0.686667 | 0 | 0.334508 | 0.1 | 0.379388 |