Skip to content

Benchmark Results - Overview

Here we collect the results of the living BioChatter benchmark. For an explanation, see the benchmarking documentation.

Scores per model

Table sorted by mean score in descending order. Click the column names to reorder.

Model name Size Median Accuracy SD
gpt-3.5-turbo-0125 175 0.91 0.22
gpt-4-0613 Unknown 0.88 0.19
openhermes-2.5 7 0.8 0.35
gpt-3.5-turbo-0613 175 0.79 0.22
gpt-4-0125-preview Unknown 0.65 0.32
chatglm3 6 0.49 0.27
mistral-instruct-v0.2 7 0.46 0.36
llama-2-chat 70 0.45 0.35
code-llama-instruct 7 0.4 0.35
llama-2-chat 13 0.4 0.34
code-llama-instruct 13 0.38 0.33
code-llama-instruct 34 0.38 0.35
llama-2-chat 7 0.35 0.32
mixtral-instruct-v0.1 46,7 0.35 0.3

Scatter Quantisation Name Boxplot Model

Scores per quantisation

Table sorted by mean score in descending order. Click the column names to reorder.

Model name Size Version Quantisation Median Accuracy SD
gpt-3.5-turbo-0125 175 nan nan 0.91 0.22
gpt-4-0613 Unknown nan nan 0.88 0.19
openhermes-2.5 7 ggufv2 Q5_K_M 0.83 0.35
openhermes-2.5 7 ggufv2 Q8_0 0.82 0.35
openhermes-2.5 7 ggufv2 Q4_K_M 0.81 0.36
openhermes-2.5 7 ggufv2 Q6_K 0.8 0.37
gpt-3.5-turbo-0613 175 nan nan 0.79 0.22
openhermes-2.5 7 ggufv2 Q3_K_M 0.68 0.35
gpt-4-0125-preview Unknown nan nan 0.65 0.32
code-llama-instruct 7 ggufv2 Q4_K_M 0.55 0.39
openhermes-2.5 7 ggufv2 Q2_K 0.52 0.33
code-llama-instruct 34 ggufv2 Q2_K 0.5 0.33
mistral-instruct-v0.2 7 ggufv2 Q6_K 0.5 0.37
code-llama-instruct 7 ggufv2 Q3_K_M 0.49 0.31
chatglm3 6 ggmlv3 q4_0 0.49 0.27
llama-2-chat 70 ggufv2 Q4_K_M 0.49 0.36
llama-2-chat 70 ggufv2 Q5_K_M 0.48 0.35
mistral-instruct-v0.2 7 ggufv2 Q5_K_M 0.48 0.37
mistral-instruct-v0.2 7 ggufv2 Q3_K_M 0.47 0.36
mistral-instruct-v0.2 7 ggufv2 Q8_0 0.45 0.36
mistral-instruct-v0.2 7 ggufv2 Q2_K 0.45 0.32
code-llama-instruct 34 ggufv2 Q3_K_M 0.45 0.31
code-llama-instruct 13 ggufv2 Q6_K 0.44 0.35
code-llama-instruct 13 ggufv2 Q8_0 0.44 0.33
llama-2-chat 70 ggufv2 Q3_K_M 0.43 0.33
llama-2-chat 70 ggufv2 Q2_K 0.43 0.37
llama-2-chat 13 ggufv2 Q8_0 0.43 0.36
code-llama-instruct 13 ggufv2 Q5_K_M 0.43 0.32
llama-2-chat 13 ggufv2 Q3_K_M 0.43 0.35
mistral-instruct-v0.2 7 ggufv2 Q4_K_M 0.42 0.38
llama-2-chat 7 ggufv2 Q4_K_M 0.41 0.31
llama-2-chat 13 ggufv2 Q5_K_M 0.4 0.34
code-llama-instruct 7 ggufv2 Q8_0 0.4 0.37
code-llama-instruct 7 ggufv2 Q5_K_M 0.39 0.34
llama-2-chat 13 ggufv2 Q6_K 0.39 0.36
code-llama-instruct 7 ggufv2 Q2_K 0.38 0.29
code-llama-instruct 34 ggufv2 Q4_K_M 0.38 0.35
llama-2-chat 7 ggufv2 Q3_K_M 0.38 0.35
code-llama-instruct 7 ggufv2 Q6_K 0.38 0.39
code-llama-instruct 34 ggufv2 Q5_K_M 0.38 0.38
code-llama-instruct 34 ggufv2 Q8_0 0.37 0.35
llama-2-chat 13 ggufv2 Q4_K_M 0.37 0.36
mixtral-instruct-v0.1 46,7 ggufv2 Q4_K_M 0.37 0.32
code-llama-instruct 34 ggufv2 Q6_K 0.37 0.36
mixtral-instruct-v0.1 46,7 ggufv2 Q5_K_M 0.36 0.34
mixtral-instruct-v0.1 46,7 ggufv2 Q6_K 0.36 0.31
llama-2-chat 7 ggufv2 Q8_0 0.36 0.3
mixtral-instruct-v0.1 46,7 ggufv2 Q3_K_M 0.34 0.29
llama-2-chat 7 ggufv2 Q6_K 0.34 0.3
mixtral-instruct-v0.1 46,7 ggufv2 Q8_0 0.33 0.27
code-llama-instruct 13 ggufv2 Q4_K_M 0.33 0.31
llama-2-chat 7 ggufv2 Q5_K_M 0.33 0.3
mixtral-instruct-v0.1 46,7 ggufv2 Q2_K 0.33 0.29
llama-2-chat 13 ggufv2 Q2_K 0.29 0.3
llama-2-chat 7 ggufv2 Q2_K 0.23 0.38
code-llama-instruct 13 ggufv2 Q2_K 0.17 0.34
code-llama-instruct 13 ggufv2 Q3_K_M 0.15 0.34

Boxplot Quantisation

Scores of all tasks

Wide table; you may need to scroll horizontally to see all columns. Table sorted by mean score in descending order. Click the column names to reorder.

Full model name explicit_relevance_of_single_fragments property_selection query_generation end_to_end_query_generation relationship_selection entity_selection property_exists naive_query_generation_using_schema implicit_relevance_of_multiple_fragments Mean Accuracy Median Accuracy SD
gpt-3.5-turbo-0125 1 0.35625 0.966667 0.926667 1 1 0.866667 0.486667 0.9 0.833657 0.913333 0.216549
gpt-4-0613 1 0.359375 0.966667 0.88 0.65 0.888889 0.888889 0.68 1 0.812647 0.884444 0.190861
openhermes-2.5:7:ggufv2:Q5_K_M 1 0.125 0.913333 0 1 0.888889 0.777778 0.586667 1 0.699074 0.833333 0.347432
openhermes-2.5:7:ggufv2:Q8_0 1 0.125 0.88 0 1 0.888889 0.755556 0.466667 1 0.679568 0.817778 0.350691
openhermes-2.5:7:ggufv2:Q4_K_M 1 0.046875 0.873333 0 1 0.888889 0.755556 0.466667 1 0.670147 0.814444 0.363419
openhermes-2.5:7:ggufv2:Q6_K 1 0.046875 0.86 0 1 1 0.733333 0.533333 1 0.685949 0.796667 0.366119
gpt-3.5-turbo-0613 1 0.3625 0.946667 0.833333 0.5 0.888889 0.755556 0.5 1 0.754105 0.794444 0.215969
openhermes-2.5:7:ggufv2:Q3_K_M 1 0.125 0.94 0 1 1 0.72 0.466667 0.5 0.639074 0.679537 0.348016
gpt-4-0125-preview 1 0 0.833333 0 0.75 0.777778 0.733333 0.44 0.5 0.559383 0.646358 0.321552
code-llama-instruct:7:ggufv2:Q4_K_M 1 0 0.966667 0 0 0.333333 0.6 0.653333 1 0.505926 0.552963 0.39125
openhermes-2.5:7:ggufv2:Q2_K 1 0 0.94 0 0.5 0.555556 0.844444 0.433333 0.5 0.53037 0.515185 0.325835
code-llama-instruct:34:ggufv2:Q2_K 0.5 0 0.686667 0 0.5 0 0.75 0.566667 1 0.444815 0.5 0.328199
mistral-instruct-v0.2:7:ggufv2:Q6_K 1 0.046875 0.833333 0 0 0.5 0.65 0.433333 1 0.495949 0.497975 0.366502
code-llama-instruct:7:ggufv2:Q3_K_M 0.833333 0 0.873333 0 0.25 0.5 0.8 0.426667 0.7 0.487037 0.493519 0.307716
chatglm3:6:ggmlv3:q4_0 0.733333 0.2875 0.553333 0 0.4 0.75 0.275 0.48 1 0.497685 0.488843 0.269443
llama-2-chat:70:ggufv2:Q4_K_M 1 0 0.92 0 0.25 0.444444 0.755556 0.42 1 0.532222 0.488333 0.3614
llama-2-chat:70:ggufv2:Q5_K_M 1 0 0.906667 0 0.25 0.444444 0.777778 0.36 0.9 0.515432 0.479938 0.351671
mistral-instruct-v0.2:7:ggufv2:Q5_K_M 1 0 0.826667 0 0 0.444444 0.688889 0.466667 1 0.491852 0.479259 0.373649
mistral-instruct-v0.2:7:ggufv2:Q3_K_M 1 0.046875 0.773333 0 0 0.333333 0.666667 0.466667 1 0.476319 0.471493 0.364921
mistral-instruct-v0.2:7:ggufv2:Q8_0 1 0.0375 0.846667 0 0 0.333333 0.644444 0.433333 0.9 0.466142 0.449738 0.358761
mistral-instruct-v0.2:7:ggufv2:Q2_K 1 0 0.693333 0 0 0.222222 0.6 0.573333 0.5 0.398765 0.449383 0.322492
code-llama-instruct:34:ggufv2:Q3_K_M 0.5 0 0.786667 0 0.25 0 0.875 0.6 0.5 0.390185 0.445093 0.306514
code-llama-instruct:13:ggufv2:Q6_K 0.833333 0 0.793333 0 0 0 0.825 0.54 0.5 0.387963 0.443981 0.345581
code-llama-instruct:13:ggufv2:Q8_0 0.833333 0 0.766667 0 0 0 0.75 0.566667 0.5 0.37963 0.439815 0.334971
llama-2-chat:70:ggufv2:Q3_K_M 1 0.171875 0.906667 0 0 0.333333 0.777778 0.413333 0.5 0.455887 0.43461 0.334424
llama-2-chat:70:ggufv2:Q2_K 1 0 0.9 0 0 0 0.666667 0.473333 0.5 0.393333 0.433333 0.365724
llama-2-chat:13:ggufv2:Q8_0 1 0 0.786667 0 0 0 0.711111 0.48 0.5 0.38642 0.43321 0.355392
code-llama-instruct:13:ggufv2:Q5_K_M 0.666667 0 0.78 0 0 0 0.775 0.566667 0.5 0.36537 0.432685 0.320506
llama-2-chat:13:ggufv2:Q3_K_M 1 0 0.68 0 0 0 0.733333 0.48 0.5 0.377037 0.428519 0.346926
mistral-instruct-v0.2:7:ggufv2:Q4_K_M 1 0 0.826667 0 0 0.333333 0.688889 0.366667 1 0.468395 0.417531 0.378326
llama-2-chat:7:ggufv2:Q4_K_M 1 0 0.646667 0 0 0.444444 0.488889 0.24 0.5 0.368889 0.406667 0.306416
llama-2-chat:13:ggufv2:Q5_K_M 1 0 0.746667 0 0 0 0.644444 0.433333 0.5 0.369383 0.401358 0.344025
code-llama-instruct:7:ggufv2:Q8_0 1 0 0.96 0 0 0 0.666667 0.4 0.5 0.391852 0.395926 0.37338
code-llama-instruct:7:ggufv2:Q5_K_M 0.833333 0 0.96 0 0 0.111111 0.688889 0.4 0.5 0.388148 0.394074 0.340156
llama-2-chat:13:ggufv2:Q6_K 1 0 0.813333 0 0 0 0.775 0.386667 0.5 0.386111 0.386389 0.363306
code-llama-instruct:7:ggufv2:Q2_K 0.333333 0.0625 0.92 0 0.25 0.25 0.8 0.533333 0.7 0.427685 0.380509 0.292686
code-llama-instruct:34:ggufv2:Q4_K_M 0.5 0 0.906667 0 0 0 0.975 0.466667 0.4 0.360926 0.380463 0.350483
llama-2-chat:7:ggufv2:Q3_K_M 1 0.1 0.693333 0 0 0.333333 0.466667 0.233333 1 0.425185 0.379259 0.353401
code-llama-instruct:7:ggufv2:Q6_K 0.833333 0 0.96 0 0 0 0.775 0.333333 0.9 0.422407 0.37787 0.391629
code-llama-instruct:34:ggufv2:Q5_K_M 0.333333 0 0.9 0 0 0.125 0.95 0.466667 1 0.419444 0.376389 0.384096
code-llama-instruct:34:ggufv2:Q8_0 0.333333 0 0.86 0 0 0.25 0.925 0.466667 0.9 0.415 0.374167 0.353285
llama-2-chat:13:ggufv2:Q4_K_M 1 0 0.76 0 0 0 0.777778 0.366667 0.5 0.378272 0.372469 0.35766
mixtral-instruct-v0.1:46_7:ggufv2:Q4_K_M 0.166667 0.1625 0.76 0 0 0.333333 0.755556 0.426667 1 0.400525 0.366929 0.324507
code-llama-instruct:34:ggufv2:Q6_K 0.333333 0 0.853333 0 0 0.125 0.9 0.473333 0.9 0.398333 0.365833 0.356636
mixtral-instruct-v0.1:46_7:ggufv2:Q5_K_M 0 0 0.84 0 0.25 0.422222 0.711111 0.333333 1 0.395185 0.364259 0.340366
mixtral-instruct-v0.1:46_7:ggufv2:Q6_K 0 0 0.826667 0 0.25 0.475 0.85 0.333333 0.7 0.381667 0.3575 0.313787
llama-2-chat:7:ggufv2:Q8_0 1 0 0.64 0 0 0.444444 0.355556 0.266667 0.5 0.356296 0.355926 0.302016
mixtral-instruct-v0.1:46_7:ggufv2:Q3_K_M 0 0.065625 0.893333 0 0.25 0.333333 0.777778 0.38 0.5 0.355563 0.344448 0.289411
llama-2-chat:7:ggufv2:Q6_K 1 0 0.66 0 0 0.375 0.333333 0.266667 0.5 0.348333 0.340833 0.302733
mixtral-instruct-v0.1:46_7:ggufv2:Q8_0 0.133333 0 0.846667 0 0.25 0.311111 0.666667 0.386667 0.6 0.354938 0.333025 0.267296
code-llama-instruct:13:ggufv2:Q4_K_M 0.333333 0 0.833333 0 0 0 0.775 0.533333 0.5 0.330556 0.331944 0.30939
llama-2-chat:7:ggufv2:Q5_K_M 1 0.0375 0.633333 0 0 0.444444 0.288889 0.293333 0.6 0.366389 0.329861 0.303743
mixtral-instruct-v0.1:46_7:ggufv2:Q2_K 0.333333 0 0.726667 0 0 0 0.733333 0.48 0.6 0.319259 0.326296 0.291554
llama-2-chat:13:ggufv2:Q2_K 1 0 0.433333 0 0 0 0.288889 0.366667 0.5 0.287654 0.288272 0.301824
llama-2-chat:7:ggufv2:Q2_K 0.833333 0 0.686667 0 0 0 0.688889 0.1 1 0.367654 0.233827 0.380825
code-llama-instruct:13:ggufv2:Q2_K 0.0333333 0 0.82 0 0 0 0.875 0.566667 0.4 0.299444 0.166389 0.336056
code-llama-instruct:13:ggufv2:Q3_K_M 0 0 0.833333 0 0 0.45 0.85 0.533333 0 0.296296 0.148148 0.336707