Skip to content

Benchmark Results - Overview

Here we collect the results of the living BioChatter benchmark. For an explanation, see the benchmarking documentation and the developer docs for further reading.

Scores per model

Table sorted by mean score in descending order. Click the column names to reorder.

Model name Size Median Accuracy SD
gpt-3.5-turbo-0125 175 0.9 0.23
gpt-4-0613 Unknown 0.88 0.19
gpt-3.5-turbo-0613 175 0.76 0.21
openhermes-2.5 7 0.74 0.33
gpt-4-0125-preview Unknown 0.69 0.31
llama-3-instruct 8 0.67 0.38
gpt-4o-2024-05-13 Unknown 0.65 0.37
chatglm3 6 0.47 0.27
mistral-instruct-v0.2 7 0.45 0.34
llama-2-chat 70 0.42 0.34
code-llama-instruct 7 0.4 0.35
code-llama-instruct 34 0.38 0.35
code-llama-instruct 13 0.38 0.33
llama-2-chat 13 0.35 0.34
llama-2-chat 7 0.34 0.32
mixtral-instruct-v0.1 46,7 0.33 0.3

Scatter Quantisation Name Boxplot Model

Scores per quantisation

Table sorted by mean score in descending order. Click the column names to reorder.

Model name Size Version Quantisation Median Accuracy SD
gpt-3.5-turbo-0125 175 nan nan 0.9 0.23
gpt-4-0613 Unknown nan nan 0.88 0.19
openhermes-2.5 7 ggufv2 Q5_K_M 0.78 0.33
gpt-3.5-turbo-0613 175 nan nan 0.76 0.21
openhermes-2.5 7 ggufv2 Q4_K_M 0.76 0.35
openhermes-2.5 7 ggufv2 Q8_0 0.76 0.33
openhermes-2.5 7 ggufv2 Q6_K 0.73 0.35
gpt-4-0125-preview Unknown nan nan 0.69 0.31
llama-3-instruct 8 ggufv2 Q8_0 0.67 0.37
llama-3-instruct 8 ggufv2 Q6_K 0.67 0.38
llama-3-instruct 8 ggufv2 Q4_K_M 0.67 0.39
gpt-4o-2024-05-13 Unknown nan nan 0.65 0.37
openhermes-2.5 7 ggufv2 Q3_K_M 0.63 0.33
llama-3-instruct 8 ggufv2 Q5_K_M 0.6 0.38
openhermes-2.5 7 ggufv2 Q2_K 0.5 0.31
code-llama-instruct 34 ggufv2 Q2_K 0.5 0.33
code-llama-instruct 7 ggufv2 Q3_K_M 0.49 0.31
mistral-instruct-v0.2 7 ggufv2 Q6_K 0.48 0.35
code-llama-instruct 7 ggufv2 Q4_K_M 0.47 0.39
chatglm3 6 ggmlv3 q4_0 0.47 0.27
mistral-instruct-v0.2 7 ggufv2 Q5_K_M 0.47 0.36
mistral-instruct-v0.2 7 ggufv2 Q3_K_M 0.47 0.35
code-llama-instruct 34 ggufv2 Q3_K_M 0.45 0.31
llama-2-chat 70 ggufv2 Q5_K_M 0.44 0.35
llama-2-chat 70 ggufv2 Q4_K_M 0.44 0.35
code-llama-instruct 13 ggufv2 Q6_K 0.44 0.35
code-llama-instruct 13 ggufv2 Q8_0 0.44 0.33
mistral-instruct-v0.2 7 ggufv2 Q8_0 0.43 0.34
code-llama-instruct 13 ggufv2 Q5_K_M 0.43 0.32
llama-2-chat 70 ggufv2 Q3_K_M 0.41 0.33
code-llama-instruct 7 ggufv2 Q8_0 0.4 0.37
code-llama-instruct 7 ggufv2 Q5_K_M 0.39 0.34
mistral-instruct-v0.2 7 ggufv2 Q2_K 0.39 0.31
llama-2-chat 13 ggufv2 Q6_K 0.39 0.36
code-llama-instruct 7 ggufv2 Q2_K 0.38 0.29
code-llama-instruct 34 ggufv2 Q4_K_M 0.38 0.35
code-llama-instruct 7 ggufv2 Q6_K 0.38 0.39
code-llama-instruct 34 ggufv2 Q5_K_M 0.38 0.38
llama-2-chat 70 ggufv2 Q2_K 0.38 0.35
code-llama-instruct 34 ggufv2 Q8_0 0.37 0.35
mistral-instruct-v0.2 7 ggufv2 Q4_K_M 0.37 0.36
code-llama-instruct 34 ggufv2 Q6_K 0.37 0.36
llama-2-chat 7 ggufv2 Q8_0 0.36 0.3
llama-2-chat 13 ggufv2 Q8_0 0.36 0.35
llama-2-chat 13 ggufv2 Q3_K_M 0.35 0.34
llama-2-chat 13 ggufv2 Q4_K_M 0.35 0.35
llama-2-chat 7 ggufv2 Q6_K 0.34 0.3
llama-2-chat 7 ggufv2 Q4_K_M 0.34 0.3
llama-2-chat 13 ggufv2 Q5_K_M 0.34 0.34
mixtral-instruct-v0.1 46,7 ggufv2 Q3_K_M 0.33 0.28
mixtral-instruct-v0.1 46,7 ggufv2 Q6_K 0.33 0.3
mixtral-instruct-v0.1 46,7 ggufv2 Q5_K_M 0.33 0.33
mixtral-instruct-v0.1 46,7 ggufv2 Q4_K_M 0.33 0.32
llama-2-chat 7 ggufv2 Q3_K_M 0.33 0.35
code-llama-instruct 13 ggufv2 Q4_K_M 0.33 0.31
mixtral-instruct-v0.1 46,7 ggufv2 Q8_0 0.31 0.26
mixtral-instruct-v0.1 46,7 ggufv2 Q2_K 0.3 0.28
llama-2-chat 7 ggufv2 Q5_K_M 0.29 0.3
llama-2-chat 13 ggufv2 Q2_K 0.27 0.29
code-llama-instruct 13 ggufv2 Q2_K 0.17 0.34
code-llama-instruct 13 ggufv2 Q3_K_M 0.15 0.34
llama-2-chat 7 ggufv2 Q2_K 0.1 0.38

Boxplot Quantisation

Scores of all tasks

Wide table; you may need to scroll horizontally to see all columns. Table sorted by mean score in descending order. Click the column names to reorder.

Full model name end_to_end_query_generation naive_query_generation_using_schema relationship_selection query_generation property_exists implicit_relevance_of_multiple_fragments entity_selection sourcedata_info_extraction explicit_relevance_of_single_fragments property_selection Mean Accuracy Median Accuracy SD
gpt-3.5-turbo-0125 0.926667 0.486667 1 0.966667 0.866667 0.9 1 0.510032 1 0.35625 0.801295 0.9 0.226907
gpt-4-0613 0.88 0.68 0.65 0.966667 0.888889 1 0.888889 0.668903 1 0.359375 0.798272 0.88 0.186915
openhermes-2.5:7:ggufv2:Q5_K_M 0 0.586667 1 0.913333 0.777778 1 0.888889 0.579916 1 0.125 0.687158 0.777778 0.331801
gpt-3.5-turbo-0613 0.833333 0.5 0.5 0.946667 0.755556 1 0.888889 0.575381 1 0.3625 0.736233 0.755556 0.211926
openhermes-2.5:7:ggufv2:Q4_K_M 0 0.466667 1 0.873333 0.755556 1 0.888889 0.597281 1 0.046875 0.66286 0.755556 0.345683
openhermes-2.5:7:ggufv2:Q8_0 0 0.466667 1 0.88 0.755556 1 0.888889 0.600829 1 0.125 0.671694 0.755556 0.333644
openhermes-2.5:7:ggufv2:Q6_K 0 0.533333 1 0.86 0.733333 1 1 0.619167 1 0.046875 0.679271 0.733333 0.3485
gpt-4-0125-preview 0 0.44 0.75 0.833333 0.733333 0.5 0.777778 0.689705 1 0 0.572415 0.689705 0.309688
llama-3-instruct:8:ggufv2:Q8_0 0 0.666667 0 0.92 0.725 1 0.875 0.188555 1 0.28125 0.565647 0.666667 0.370078
llama-3-instruct:8:ggufv2:Q6_K 0 0.666667 0 0.926667 0.775 1 0.875 0.162657 1 0.28125 0.568724 0.666667 0.375292
llama-3-instruct:8:ggufv2:Q4_K_M 0 0.666667 0 0.92 0.775 1 0.875 0.116871 1 0.109375 0.546291 0.666667 0.394468
gpt-4o-2024-05-13 0 0.533333 0 0.8 0.85 0.7 1 0.653946 1 0 0.553728 0.653946 0.370215
openhermes-2.5:7:ggufv2:Q3_K_M 0 0.466667 1 0.94 0.72 0.5 1 0.554488 1 0.125 0.630615 0.630615 0.332498
llama-3-instruct:8:ggufv2:Q5_K_M 0 0.6 0 0.926667 0.65 1 0.875 0.166434 1 0.1875 0.54056 0.6 0.375485
openhermes-2.5:7:ggufv2:Q2_K 0 0.433333 0.5 0.94 0.844444 0.5 0.555556 0.444054 1 0 0.521739 0.5 0.311683
code-llama-instruct:34:ggufv2:Q2_K 0 0.566667 0.5 0.686667 0.75 1 0 nan 0.5 0 0.444815 0.5 0.328199
code-llama-instruct:7:ggufv2:Q3_K_M 0 0.426667 0.25 0.873333 0.8 0.7 0.5 nan 0.833333 0 0.487037 0.493519 0.307716
mistral-instruct-v0.2:7:ggufv2:Q6_K 0 0.433333 0 0.833333 0.65 1 0.5 0.367412 1 0.046875 0.483095 0.483095 0.351374
code-llama-instruct:7:ggufv2:Q4_K_M 0 0.653333 0 0.966667 0.6 1 0.333333 0.138732 1 0 0.469207 0.469207 0.38731
chatglm3:6:ggmlv3:q4_0 0 0.48 0.4 0.553333 0.275 1 0.75 0.188284 0.733333 0.2875 0.466745 0.466745 0.271708
mistral-instruct-v0.2:7:ggufv2:Q5_K_M 0 0.466667 0 0.826667 0.688889 1 0.444444 0.385754 1 0 0.481242 0.466667 0.357557
mistral-instruct-v0.2:7:ggufv2:Q3_K_M 0 0.466667 0 0.773333 0.666667 1 0.333333 0.368974 1 0.046875 0.465585 0.465585 0.349288
code-llama-instruct:34:ggufv2:Q3_K_M 0 0.6 0.25 0.786667 0.875 0.5 0 nan 0.5 0 0.390185 0.445093 0.306514
llama-2-chat:70:ggufv2:Q5_K_M 0 0.36 0.25 0.906667 0.777778 0.9 0.444444 0.210166 1 0 0.484905 0.444444 0.346535
llama-2-chat:70:ggufv2:Q4_K_M 0 0.42 0.25 0.92 0.755556 1 0.444444 0.240936 1 0 0.503094 0.444444 0.354692
code-llama-instruct:13:ggufv2:Q6_K 0 0.54 0 0.793333 0.825 0.5 0 nan 0.833333 0 0.387963 0.443981 0.345581
code-llama-instruct:13:ggufv2:Q8_0 0 0.566667 0 0.766667 0.75 0.5 0 nan 0.833333 0 0.37963 0.439815 0.334971
mistral-instruct-v0.2:7:ggufv2:Q8_0 0 0.433333 0 0.846667 0.644444 0.9 0.333333 0.351684 1 0.0375 0.454696 0.433333 0.343652
code-llama-instruct:13:ggufv2:Q5_K_M 0 0.566667 0 0.78 0.775 0.5 0 nan 0.666667 0 0.36537 0.432685 0.320506
llama-2-chat:70:ggufv2:Q3_K_M 0 0.413333 0 0.906667 0.777778 0.5 0.333333 0.197898 1 0.171875 0.430088 0.413333 0.327267
code-llama-instruct:7:ggufv2:Q8_0 0 0.4 0 0.96 0.666667 0.5 0 nan 1 0 0.391852 0.395926 0.37338
code-llama-instruct:7:ggufv2:Q5_K_M 0 0.4 0 0.96 0.688889 0.5 0.111111 nan 0.833333 0 0.388148 0.394074 0.340156
mistral-instruct-v0.2:7:ggufv2:Q2_K 0 0.573333 0 0.693333 0.6 0.5 0.222222 0.331261 1 0 0.392015 0.392015 0.307746
llama-2-chat:13:ggufv2:Q6_K 0 0.386667 0 0.813333 0.775 0.5 0 nan 1 0 0.386111 0.386389 0.363306
code-llama-instruct:7:ggufv2:Q2_K 0 0.533333 0.25 0.92 0.8 0.7 0.25 nan 0.333333 0.0625 0.427685 0.380509 0.292686
code-llama-instruct:34:ggufv2:Q4_K_M 0 0.466667 0 0.906667 0.975 0.4 0 nan 0.5 0 0.360926 0.380463 0.350483
code-llama-instruct:7:ggufv2:Q6_K 0 0.333333 0 0.96 0.775 0.9 0 nan 0.833333 0 0.422407 0.37787 0.391629
code-llama-instruct:34:ggufv2:Q5_K_M 0 0.466667 0 0.9 0.95 1 0.125 nan 0.333333 0 0.419444 0.376389 0.384096
llama-2-chat:70:ggufv2:Q2_K 0 0.473333 0 0.9 0.666667 0.5 0 0.215047 1 0 0.375505 0.375505 0.352226
code-llama-instruct:34:ggufv2:Q8_0 0 0.466667 0 0.86 0.925 0.9 0.25 nan 0.333333 0 0.415 0.374167 0.353285
mistral-instruct-v0.2:7:ggufv2:Q4_K_M 0 0.366667 0 0.826667 0.688889 1 0.333333 0.347025 1 0 0.456258 0.366667 0.363014
code-llama-instruct:34:ggufv2:Q6_K 0 0.473333 0 0.853333 0.9 0.9 0.125 nan 0.333333 0 0.398333 0.365833 0.356636
llama-2-chat:7:ggufv2:Q8_0 0 0.266667 0 0.64 0.355556 0.5 0.444444 nan 1 0 0.356296 0.355926 0.302016
llama-2-chat:13:ggufv2:Q8_0 0 0.48 0 0.786667 0.711111 0.5 0 0.0762457 1 0 0.355402 0.355402 0.350017
llama-2-chat:13:ggufv2:Q3_K_M 0 0.48 0 0.68 0.733333 0.5 0 0.112631 1 0 0.350596 0.350596 0.338994
llama-2-chat:13:ggufv2:Q4_K_M 0 0.366667 0 0.76 0.777778 0.5 0 0.0888675 1 0 0.349331 0.349331 0.350915
llama-2-chat:7:ggufv2:Q6_K 0 0.266667 0 0.66 0.333333 0.5 0.375 nan 1 0 0.348333 0.340833 0.302733
llama-2-chat:7:ggufv2:Q4_K_M 0 0.24 0 0.646667 0.488889 0.5 0.444444 0.0852494 1 0 0.340525 0.340525 0.303018
llama-2-chat:13:ggufv2:Q5_K_M 0 0.433333 0 0.746667 0.644444 0.5 0 0.0766167 1 0 0.340106 0.340106 0.338412
mixtral-instruct-v0.1:46_7:ggufv2:Q3_K_M 0 0.38 0.25 0.893333 0.777778 0.5 0.333333 0.229622 0 0.065625 0.342969 0.333333 0.278279
mixtral-instruct-v0.1:46_7:ggufv2:Q6_K 0 0.333333 0.25 0.826667 0.85 0.7 0.475 0.225524 0 0 0.366052 0.333333 0.302567
mixtral-instruct-v0.1:46_7:ggufv2:Q5_K_M 0 0.333333 0.25 0.84 0.711111 1 0.422222 0.235659 0 0 0.379233 0.333333 0.327866
mixtral-instruct-v0.1:46_7:ggufv2:Q4_K_M 0 0.426667 0 0.76 0.755556 1 0.333333 0.193786 0.166667 0.1625 0.379851 0.333333 0.315144
llama-2-chat:7:ggufv2:Q3_K_M 0 0.233333 0 0.693333 0.466667 1 0.333333 0.0650717 1 0.1 0.389174 0.333333 0.352468
code-llama-instruct:13:ggufv2:Q4_K_M 0 0.533333 0 0.833333 0.775 0.5 0 nan 0.333333 0 0.330556 0.331944 0.30939
mixtral-instruct-v0.1:46_7:ggufv2:Q8_0 0 0.386667 0.25 0.846667 0.666667 0.6 0.311111 0.189177 0.133333 0 0.338362 0.311111 0.259273
mixtral-instruct-v0.1:46_7:ggufv2:Q2_K 0 0.48 0 0.726667 0.733333 0.6 0 0.157514 0.333333 0 0.303085 0.303085 0.281803
llama-2-chat:7:ggufv2:Q5_K_M 0 0.293333 0 0.633333 0.288889 0.6 0.444444 0.0697591 1 0.0375 0.336726 0.293333 0.301858
llama-2-chat:13:ggufv2:Q2_K 0 0.366667 0 0.433333 0.288889 0.5 0 0.0649389 1 0 0.265383 0.265383 0.294744
code-llama-instruct:13:ggufv2:Q2_K 0 0.566667 0 0.82 0.875 0.4 0 nan 0.0333333 0 0.299444 0.166389 0.336056
code-llama-instruct:13:ggufv2:Q3_K_M 0 0.533333 0 0.833333 0.85 0 0.45 nan 0 0 0.296296 0.148148 0.336707
llama-2-chat:7:ggufv2:Q2_K 0 0.1 0 0.686667 0.688889 1 0 0.0361865 0.833333 0 0.334508 0.1 0.379388