Skip to content

Benchmark Results - Overview

Here we collect the results of the living BioChatter benchmark. For an explanation, see the benchmarking documentation and the developer docs for further reading.

Scores per model

Table sorted by median score in descending order. Click the column names to reorder.

Model name Size Median Accuracy SD
gpt-3.5-turbo-0125 175 0.79 0.2
gpt-4o-2024-08-06 Unknown 0.78 0.24
claude-3-opus-20240229 Unknown 0.77 0.28
gpt-3.5-turbo-0613 175 0.76 0.21
claude-3-5-sonnet-20240620 Unknown 0.76 0.28
llama-3.1-instruct 70 0.73 0.29
gpt-4-0613 Unknown 0.73 0.17
llama-3.1-instruct 8 0.72 0.28
gpt-4-turbo-2024-04-09 Unknown 0.71 0.26
gpt-4-0125-preview Unknown 0.69 0.27
gpt-4o-mini-2024-07-18 Unknown 0.69 0.23
gpt-4o-2024-05-13 Unknown 0.68 0.31
llama-3-instruct 8 0.65 0.36
openhermes-2.5 7 0.6 0.3
chatglm3 6 0.44 0.26
llama-2-chat 70 0.42 0.34
mistral-instruct-v0.2 7 0.4 0.33
code-llama-instruct 7 0.4 0.35
code-llama-instruct 34 0.38 0.35
code-llama-instruct 13 0.38 0.33
llama-2-chat 13 0.38 0.33
mixtral-instruct-v0.1 46,7 0.34 0.28
llama-2-chat 7 0.31 0.3

Scatter Quantisation Name Boxplot Model

Scores per quantisation

Table sorted by median score in descending order. Click the column names to reorder.

Model name Size Version Quantisation Median Accuracy SD
gpt-3.5-turbo-0125 175 nan nan 0.79 0.2
gpt-4o-2024-08-06 Unknown nan nan 0.78 0.24
claude-3-opus-20240229 Unknown nan nan 0.77 0.28
claude-3-5-sonnet-20240620 Unknown nan nan 0.76 0.28
gpt-3.5-turbo-0613 175 nan nan 0.76 0.21
llama-3.1-instruct 8 ggufv2 Q6_K 0.74 0.28
llama-3.1-instruct 70 ggufv2 IQ2_M 0.74 0.29
llama-3.1-instruct 8 ggufv2 Q5_K_M 0.74 0.28
gpt-4-0613 Unknown nan nan 0.73 0.17
llama-3.1-instruct 70 ggufv2 IQ4_XS 0.73 0.29
llama-3.1-instruct 8 ggufv2 Q8_0 0.72 0.3
gpt-4-turbo-2024-04-09 Unknown nan nan 0.71 0.26
llama-3.1-instruct 8 ggufv2 Q3_K_L 0.71 0.28
llama-3.1-instruct 8 ggufv2 Q4_K_M 0.7 0.26
llama-3.1-instruct 8 ggufv2 IQ4_XS 0.69 0.28
gpt-4-0125-preview Unknown nan nan 0.69 0.27
gpt-4o-mini-2024-07-18 Unknown nan nan 0.69 0.23
gpt-4o-2024-05-13 Unknown nan nan 0.68 0.31
llama-3.1-instruct 70 ggufv2 Q3_K_S 0.67 0.28
llama-3-instruct 8 ggufv2 Q8_0 0.65 0.35
llama-3-instruct 8 ggufv2 Q4_K_M 0.65 0.38
llama-3-instruct 8 ggufv2 Q6_K 0.65 0.36
openhermes-2.5 7 ggufv2 Q6_K 0.62 0.3
llama-3-instruct 8 ggufv2 Q5_K_M 0.62 0.36
openhermes-2.5 7 ggufv2 Q5_K_M 0.6 0.29
openhermes-2.5 7 ggufv2 Q8_0 0.6 0.3
openhermes-2.5 7 ggufv2 Q4_K_M 0.6 0.3
openhermes-2.5 7 ggufv2 Q3_K_M 0.56 0.3
code-llama-instruct 34 ggufv2 Q2_K 0.5 0.33
openhermes-2.5 7 ggufv2 Q2_K 0.5 0.28
code-llama-instruct 7 ggufv2 Q3_K_M 0.49 0.31
code-llama-instruct 7 ggufv2 Q4_K_M 0.47 0.39
mistral-instruct-v0.2 7 ggufv2 Q5_K_M 0.46 0.34
mistral-instruct-v0.2 7 ggufv2 Q6_K 0.45 0.34
code-llama-instruct 34 ggufv2 Q3_K_M 0.45 0.31
chatglm3 6 ggmlv3 q4_0 0.44 0.26
llama-2-chat 70 ggufv2 Q4_K_M 0.44 0.35
llama-2-chat 70 ggufv2 Q5_K_M 0.44 0.35
code-llama-instruct 13 ggufv2 Q6_K 0.44 0.35
code-llama-instruct 13 ggufv2 Q8_0 0.44 0.33
code-llama-instruct 13 ggufv2 Q5_K_M 0.43 0.32
llama-2-chat 70 ggufv2 Q3_K_M 0.41 0.33
mistral-instruct-v0.2 7 ggufv2 Q3_K_M 0.41 0.34
mistral-instruct-v0.2 7 ggufv2 Q8_0 0.4 0.33
llama-2-chat 13 ggufv2 Q8_0 0.4 0.34
code-llama-instruct 7 ggufv2 Q8_0 0.4 0.37
code-llama-instruct 7 ggufv2 Q5_K_M 0.39 0.34
llama-2-chat 13 ggufv2 Q3_K_M 0.39 0.33
llama-2-chat 13 ggufv2 Q5_K_M 0.39 0.33
code-llama-instruct 7 ggufv2 Q2_K 0.38 0.29
code-llama-instruct 34 ggufv2 Q4_K_M 0.38 0.35
code-llama-instruct 7 ggufv2 Q6_K 0.38 0.39
code-llama-instruct 34 ggufv2 Q5_K_M 0.38 0.38
llama-2-chat 70 ggufv2 Q2_K 0.38 0.35
llama-2-chat 13 ggufv2 Q6_K 0.37 0.34
code-llama-instruct 34 ggufv2 Q8_0 0.37 0.35
mistral-instruct-v0.2 7 ggufv2 Q2_K 0.37 0.29
mistral-instruct-v0.2 7 ggufv2 Q4_K_M 0.37 0.35
code-llama-instruct 34 ggufv2 Q6_K 0.37 0.36
llama-2-chat 13 ggufv2 Q4_K_M 0.36 0.34
mixtral-instruct-v0.1 46,7 ggufv2 Q4_K_M 0.35 0.3
mixtral-instruct-v0.1 46,7 ggufv2 Q5_K_M 0.34 0.31
mixtral-instruct-v0.1 46,7 ggufv2 Q6_K 0.34 0.29
llama-2-chat 7 ggufv2 Q5_K_M 0.34 0.29
mixtral-instruct-v0.1 46,7 ggufv2 Q3_K_M 0.33 0.28
llama-2-chat 7 ggufv2 Q2_K 0.33 0.32
code-llama-instruct 13 ggufv2 Q4_K_M 0.33 0.31
mixtral-instruct-v0.1 46,7 ggufv2 Q8_0 0.33 0.25
mixtral-instruct-v0.1 46,7 ggufv2 Q2_K 0.32 0.27
llama-2-chat 7 ggufv2 Q8_0 0.31 0.28
llama-2-chat 7 ggufv2 Q6_K 0.31 0.29
llama-2-chat 7 ggufv2 Q4_K_M 0.3 0.29
llama-2-chat 7 ggufv2 Q3_K_M 0.3 0.33
llama-2-chat 13 ggufv2 Q2_K 0.28 0.29
code-llama-instruct 13 ggufv2 Q2_K 0.17 0.34
code-llama-instruct 13 ggufv2 Q3_K_M 0.15 0.34

Boxplot Quantisation

Scores of all tasks

Wide table; you may need to scroll horizontally to see all columns. Table sorted by median score in descending order. Click the column names to reorder.

Full model name medical_exam naive_query_generation_using_schema implicit_relevance_of_multiple_fragments property_exists query_generation relationship_selection property_selection explicit_relevance_of_single_fragments multimodal_answer end_to_end_query_generation sourcedata_info_extraction entity_selection api_calling Mean Accuracy Median Accuracy SD
gpt-3.5-turbo-0125 0.671322 0.514451 0.9 0.789474 0.953757 1 0.36747 1 nan 0.919075 0.510032 0.978261 0.746479 0.779193 0.789474 0.201291
gpt-4o-2024-08-06 0.850211 0.528302 0.666667 0.179894 0.874214 1 0.425439 1 nan 0.830189 0.711185 1 nan 0.733282 0.781735 0.242634
claude-3-opus-20240229 0.805556 0.733333 1 1 0.944444 0 0.421875 0.833333 nan 0.655556 0.691235 1 nan 0.73503 0.770293 0.276411
claude-3-5-sonnet-20240620 0.7737 0.633333 1 0.866667 0.966667 0 0.375 1 nan 0.733333 0.756088 1 nan 0.736799 0.764894 0.283502
gpt-3.5-turbo-0613 nan 0.5 1 0.755556 0.946667 0.5 0.3625 1 nan 0.833333 0.575381 0.888889 nan 0.736233 0.755556 0.211926
llama-3.1-instruct:8:ggufv2:Q6_K 0.751748 0.633333 0.833333 0.833333 0.955556 0 0.46875 1 nan 0.733333 0.394469 1 nan 0.69126 0.742541 0.278015
llama-3.1-instruct:70:ggufv2:IQ2_M 0.772881 0.633333 1 0.916667 0.955556 0 0.328125 1 nan 0.6 0.626498 1 nan 0.712096 0.742489 0.293676
llama-3.1-instruct:8:ggufv2:Q5_K_M 0.749117 0.7 0.833333 0.833333 0.933333 0 0.4375 1 nan 0.733333 0.380477 1 nan 0.690948 0.741225 0.279278
gpt-4-0613 0.730912 0.682081 1 0.666667 0.959538 0.695652 0.38253 1 nan 0.878613 0.668903 0.920635 0.619048 0.767048 0.730912 0.172297
llama-3.1-instruct:70:ggufv2:IQ4_XS 0.822581 0.566667 1 0.75 0.955556 0 0.375 1 nan 0.6 0.699238 1 nan 0.706276 0.728138 0.285226
llama-3.1-instruct:8:ggufv2:Q8_0 0.765734 0.660377 1 0.161765 0.937107 0.142857 0.565789 1 nan 0.773585 0.38907 1 nan 0.67239 0.719062 0.295232
gpt-4-turbo-2024-04-09 0.83701 0.508671 1 0.657143 0.83237 0.130435 0.325301 1 0.99 0.635838 0.650369 1 nan 0.713928 0.713928 0.262809
llama-3.1-instruct:8:ggufv2:Q3_K_L 0.768421 0.622642 0.833333 0.24 0.943396 0.142857 0.486842 1 nan 0.811321 0.360379 1 nan 0.655381 0.711901 0.280463
llama-3.1-instruct:8:ggufv2:Q4_K_M 0.741935 0.660377 0.833333 0.150538 0.924528 0.285714 0.513158 1 nan 0.735849 0.382027 0.92 nan 0.649769 0.698113 0.257418
llama-3.1-instruct:8:ggufv2:IQ4_XS 0.756944 0.646018 0.833333 0.439394 0.946903 0 0.387255 1 nan 0.743363 0.414621 0.893939 nan 0.641979 0.69469 0.276449
gpt-4-0125-preview 0.777159 0.456647 0.5 0.619048 0.83815 0.782609 0.0301205 1 nan 0.109827 0.689705 0.824561 0.793651 0.618456 0.689705 0.27372
gpt-4o-mini-2024-07-18 0.840796 0.537572 0.5 0.52439 0.953757 0.130435 0.388554 0.833333 0.98 0.687861 0.684553 0.921053 0.714286 0.668969 0.686207 0.229321
gpt-4o-2024-05-13 0.763501 0.537572 0.7 0.526316 0.809249 0.130435 0.0301205 1 0.96 0.115607 0.653946 1 0.809524 0.618175 0.676973 0.312259
llama-3.1-instruct:70:ggufv2:Q3_K_S 0.8 0.633333 1 0.625 0.966667 0 0.375 1 nan 0.6 0.642336 1 nan 0.694758 0.668547 0.28438
llama-3-instruct:8:ggufv2:Q8_0 0.640669 0.666667 1 0.725 0.92 0 0.28125 1 nan 0 0.188555 0.875 nan 0.572467 0.653668 0.35454
llama-3-instruct:8:ggufv2:Q4_K_M 0.624884 0.666667 1 0.775 0.92 0 0.109375 1 nan 0 0.116871 0.861111 nan 0.552173 0.645775 0.376754
llama-3-instruct:8:ggufv2:Q6_K 0.623955 0.666667 1 0.775 0.926667 0 0.28125 1 nan 0 0.162657 0.875 nan 0.573745 0.645311 0.359165
openhermes-2.5:7:ggufv2:Q6_K 0.57423 0.557078 1 0.113379 0.890411 0.896552 0.126404 1 nan 0.273973 0.619167 0.7675 nan 0.619881 0.619524 0.300867
llama-3-instruct:8:ggufv2:Q5_K_M 0.635097 0.6 1 0.65 0.926667 0 0.1875 1 nan 0 0.166434 0.875 nan 0.549154 0.617549 0.360565
openhermes-2.5:7:ggufv2:Q5_K_M 0.571429 0.56621 1 0.120988 0.917808 0.896552 0.196629 1 nan 0.26484 0.579916 0.758808 nan 0.624835 0.602375 0.293641
openhermes-2.5:7:ggufv2:Q8_0 0.577031 0.497717 1 0.100629 0.90411 0.896552 0.196629 1 nan 0.237443 0.600829 0.628118 nan 0.603551 0.60219 0.296591
openhermes-2.5:7:ggufv2:Q4_K_M 0.586368 0.479452 1 0.140921 0.894977 0.896552 0.126404 1 nan 0.246575 0.597281 0.66967 nan 0.603473 0.600377 0.299216
openhermes-2.5:7:ggufv2:Q3_K_M 0.563959 0.47032 0.5 0.156805 0.917808 1 0.171348 1 nan 0.287671 0.554488 1 nan 0.602036 0.559223 0.301462
code-llama-instruct:34:ggufv2:Q2_K nan 0.566667 1 0.75 0.686667 0.5 0 0.5 nan 0 nan 0 nan 0.444815 0.5 0.328199
openhermes-2.5:7:ggufv2:Q2_K 0.539106 0.420091 0.5 0.18638 0.917808 0.655172 0.0168539 1 nan 0.159817 0.444054 0.604444 nan 0.494884 0.497442 0.27656
code-llama-instruct:7:ggufv2:Q3_K_M nan 0.426667 0.7 0.8 0.873333 0.25 0 0.833333 nan 0 nan 0.5 nan 0.487037 0.493519 0.307716
code-llama-instruct:7:ggufv2:Q4_K_M nan 0.653333 1 0.6 0.966667 0 0 1 nan 0 0.138732 0.333333 nan 0.469207 0.469207 0.38731
mistral-instruct-v0.2:7:ggufv2:Q5_K_M 0.364146 0.466667 1 0.688889 0.826667 0 0 1 nan 0 0.385754 0.444444 nan 0.470597 0.455556 0.34385
mistral-instruct-v0.2:7:ggufv2:Q6_K 0.366947 0.433333 1 0.65 0.833333 0 0.046875 1 nan 0 0.367412 0.5 nan 0.472536 0.452935 0.337974
code-llama-instruct:34:ggufv2:Q3_K_M nan 0.6 0.5 0.875 0.786667 0.25 0 0.5 nan 0 nan 0 nan 0.390185 0.445093 0.306514
chatglm3:6:ggmlv3:q4_0 0.426704 0.48 1 0.275 0.553333 0.4 0.2875 0.733333 nan 0 0.188284 0.75 nan 0.463105 0.444905 0.260423
llama-2-chat:70:ggufv2:Q4_K_M nan 0.42 1 0.755556 0.92 0.25 0 1 nan 0 0.240936 0.444444 nan 0.503094 0.444444 0.354692
llama-2-chat:70:ggufv2:Q5_K_M nan 0.36 0.9 0.777778 0.906667 0.25 0 1 nan 0 0.210166 0.444444 nan 0.484905 0.444444 0.346535
code-llama-instruct:13:ggufv2:Q6_K nan 0.54 0.5 0.825 0.793333 0 0 0.833333 nan 0 nan 0 nan 0.387963 0.443981 0.345581
code-llama-instruct:13:ggufv2:Q8_0 nan 0.566667 0.5 0.75 0.766667 0 0 0.833333 nan 0 nan 0 nan 0.37963 0.439815 0.334971
code-llama-instruct:13:ggufv2:Q5_K_M nan 0.566667 0.5 0.775 0.78 0 0 0.666667 nan 0 nan 0 nan 0.36537 0.432685 0.320506
llama-2-chat:70:ggufv2:Q3_K_M nan 0.413333 0.5 0.777778 0.906667 0 0.171875 1 nan 0 0.197898 0.333333 nan 0.430088 0.413333 0.327267
mistral-instruct-v0.2:7:ggufv2:Q3_K_M 0.360411 0.466667 1 0.666667 0.773333 0 0.046875 1 nan 0 0.368974 0.333333 nan 0.456024 0.412499 0.335885
mistral-instruct-v0.2:7:ggufv2:Q8_0 0.366947 0.433333 0.9 0.644444 0.846667 0 0.0375 1 nan 0 0.351684 0.333333 nan 0.446719 0.40014 0.330107
llama-2-chat:13:ggufv2:Q8_0 0.431373 0.48 0.5 0.711111 0.786667 0 0 1 nan 0 0.0762457 0 nan 0.362309 0.396841 0.335904
code-llama-instruct:7:ggufv2:Q8_0 nan 0.4 0.5 0.666667 0.96 0 0 1 nan 0 nan 0 nan 0.391852 0.395926 0.37338
code-llama-instruct:7:ggufv2:Q5_K_M nan 0.4 0.5 0.688889 0.96 0 0 0.833333 nan 0 nan 0.111111 nan 0.388148 0.394074 0.340156
llama-2-chat:13:ggufv2:Q3_K_M 0.428571 0.48 0.5 0.733333 0.68 0 0 1 nan 0 0.112631 0 nan 0.357685 0.393128 0.325419
llama-2-chat:13:ggufv2:Q5_K_M 0.431373 0.433333 0.5 0.644444 0.746667 0 0 1 nan 0 0.0766167 0 nan 0.348403 0.389888 0.32518
code-llama-instruct:7:ggufv2:Q2_K nan 0.533333 0.7 0.8 0.92 0.25 0.0625 0.333333 nan 0 nan 0.25 nan 0.427685 0.380509 0.292686
code-llama-instruct:34:ggufv2:Q4_K_M nan 0.466667 0.4 0.975 0.906667 0 0 0.5 nan 0 nan 0 nan 0.360926 0.380463 0.350483
code-llama-instruct:7:ggufv2:Q6_K nan 0.333333 0.9 0.775 0.96 0 0 0.833333 nan 0 nan 0 nan 0.422407 0.37787 0.391629
code-llama-instruct:34:ggufv2:Q5_K_M nan 0.466667 1 0.95 0.9 0 0 0.333333 nan 0 nan 0.125 nan 0.419444 0.376389 0.384096
llama-2-chat:70:ggufv2:Q2_K nan 0.473333 0.5 0.666667 0.9 0 0 1 nan 0 0.215047 0 nan 0.375505 0.375505 0.352226
llama-2-chat:13:ggufv2:Q6_K 0.428571 0.386667 0.5 0.775 0.813333 0 0 1 nan 0 0.0781337 0 nan 0.361973 0.37432 0.342819
code-llama-instruct:34:ggufv2:Q8_0 nan 0.466667 0.9 0.925 0.86 0 0 0.333333 nan 0 nan 0.25 nan 0.415 0.374167 0.353285
mistral-instruct-v0.2:7:ggufv2:Q2_K 0.352941 0.573333 0.5 0.6 0.693333 0 0 1 nan 0 0.331261 0.222222 nan 0.388463 0.370702 0.294881
mistral-instruct-v0.2:7:ggufv2:Q4_K_M 0.365079 0.366667 1 0.688889 0.826667 0 0 1 nan 0 0.347025 0.333333 nan 0.447969 0.365873 0.348328
code-llama-instruct:34:ggufv2:Q6_K nan 0.473333 0.9 0.9 0.853333 0 0 0.333333 nan 0 nan 0.125 nan 0.398333 0.365833 0.356636
llama-2-chat:13:ggufv2:Q4_K_M 0.428571 0.366667 0.5 0.777778 0.76 0 0 1 nan 0 0.0888675 0 nan 0.356535 0.361601 0.336686
mixtral-instruct-v0.1:46_7:ggufv2:Q4_K_M 0.368814 0.426667 1 0.755556 0.76 0 0.1625 0.166667 nan 0 0.193786 0.333333 nan 0.378848 0.351074 0.301567
mixtral-instruct-v0.1:46_7:ggufv2:Q5_K_M 0.352941 0.333333 1 0.711111 0.84 0.25 0 0 nan 0 0.235659 0.422222 nan 0.376842 0.343137 0.313874
mixtral-instruct-v0.1:46_7:ggufv2:Q6_K 0.34267 0.333333 0.7 0.85 0.826667 0.25 0 0 nan 0 0.225524 0.475 nan 0.363927 0.338002 0.289705
llama-2-chat:7:ggufv2:Q5_K_M 0.40056 0.3379 0.6 0.155556 0.547945 0 0.0337079 1 nan 0 0.0697591 0.57037 nan 0.3378 0.33785 0.293711
mixtral-instruct-v0.1:46_7:ggufv2:Q3_K_M nan 0.38 0.5 0.777778 0.893333 0.25 0.065625 0 nan 0 0.229622 0.333333 nan 0.342969 0.333333 0.278279
llama-2-chat:7:ggufv2:Q2_K 0.369748 0.164384 1 0.324786 0.611872 0 0 0.833333 nan 0 0.0361865 0.410256 nan 0.340961 0.332873 0.320018
code-llama-instruct:13:ggufv2:Q4_K_M nan 0.533333 0.5 0.775 0.833333 0 0 0.333333 nan 0 nan 0 nan 0.330556 0.331944 0.30939
mixtral-instruct-v0.1:46_7:ggufv2:Q8_0 0.358543 0.386667 0.6 0.666667 0.846667 0.25 0 0.133333 nan 0 0.189177 0.311111 nan 0.340197 0.325654 0.248216
mixtral-instruct-v0.1:46_7:ggufv2:Q2_K 0.329599 0.48 0.6 0.733333 0.726667 0 0 0.333333 nan 0 0.157514 0 nan 0.305495 0.317547 0.269925
llama-2-chat:7:ggufv2:Q8_0 0.40056 0.292237 0.5 0.163399 0.589041 0.103448 0 1 nan 0 0.0847297 0.481481 nan 0.328627 0.310432 0.278624
llama-2-chat:7:ggufv2:Q6_K 0.406162 0.292237 0.5 0.177778 0.561644 0 0 1 nan 0 0.0614608 0.553846 nan 0.323012 0.307625 0.290181
llama-2-chat:7:ggufv2:Q4_K_M 0.40056 0.273973 0.5 0.251852 0.611872 0 0 1 nan 0 0.0852494 0.57037 nan 0.335807 0.30489 0.291028
llama-2-chat:7:ggufv2:Q3_K_M 0.394958 0.228311 1 0.207407 0.589041 0.103448 0.0898876 1 nan 0 0.0650717 0.435897 nan 0.374002 0.301156 0.326263
llama-2-chat:13:ggufv2:Q2_K 0.414566 0.366667 0.5 0.288889 0.433333 0 0 1 nan 0 0.0649389 0 nan 0.278945 0.283917 0.285171
code-llama-instruct:13:ggufv2:Q2_K nan 0.566667 0.4 0.875 0.82 0 0 0.0333333 nan 0 nan 0 nan 0.299444 0.166389 0.336056
code-llama-instruct:13:ggufv2:Q3_K_M nan 0.533333 0 0.85 0.833333 0 0 0 nan 0 nan 0.45 nan 0.296296 0.148148 0.336707