Skip to content

Benchmark Results - Overview

Here we collect the results of the living BioChatter benchmark. For an explanation, see the benchmarking documentation and the developer docs for further reading.

Scores per model

Table sorted by median score in descending order. Click the column names to reorder.

Model name Size Median Accuracy SD
gpt-3.5-turbo-0125 175 0.79 0.2
gpt-4o-2024-08-06 Unknown 0.78 0.24
claude-3-opus-20240229 Unknown 0.77 0.28
gpt-3.5-turbo-0613 175 0.76 0.21
claude-3-5-sonnet-20240620 Unknown 0.76 0.28
llama-3.1-instruct 70 0.73 0.29
gpt-4-0613 Unknown 0.73 0.17
llama-3.1-instruct 8 0.72 0.28
gpt-4-turbo-2024-04-09 Unknown 0.71 0.26
gpt-4-0125-preview Unknown 0.69 0.27
gpt-4o-mini-2024-07-18 Unknown 0.69 0.23
gpt-4o-2024-05-13 Unknown 0.68 0.31
llama-3-instruct 8 0.65 0.36
openhermes-2.5 7 0.6 0.3
chatglm3 6 0.44 0.26
llama-2-chat 70 0.42 0.34
mistral-instruct-v0.2 7 0.4 0.33
code-llama-instruct 7 0.4 0.35
code-llama-instruct 34 0.38 0.35
code-llama-instruct 13 0.38 0.33
llama-2-chat 13 0.38 0.33
mixtral-instruct-v0.1 46,7 0.34 0.28
llama-2-chat 7 0.31 0.3

Scatter Quantisation Name Boxplot Model

Scores per quantisation

Table sorted by median score in descending order. Click the column names to reorder.

Model name Size Version Quantisation Median Accuracy SD
gpt-3.5-turbo-0125 175 nan nan 0.79 0.2
gpt-4o-2024-08-06 Unknown nan nan 0.78 0.24
claude-3-opus-20240229 Unknown nan nan 0.77 0.28
claude-3-5-sonnet-20240620 Unknown nan nan 0.76 0.28
gpt-3.5-turbo-0613 175 nan nan 0.76 0.21
llama-3.1-instruct 8 ggufv2 Q6_K 0.74 0.28
llama-3.1-instruct 70 ggufv2 IQ2_M 0.74 0.29
llama-3.1-instruct 8 ggufv2 Q5_K_M 0.74 0.28
gpt-4-0613 Unknown nan nan 0.73 0.17
llama-3.1-instruct 70 ggufv2 IQ4_XS 0.73 0.29
llama-3.1-instruct 8 ggufv2 Q8_0 0.72 0.3
gpt-4-turbo-2024-04-09 Unknown nan nan 0.71 0.26
llama-3.1-instruct 8 ggufv2 Q3_K_L 0.71 0.28
llama-3.1-instruct 8 ggufv2 Q4_K_M 0.7 0.26
llama-3.1-instruct 8 ggufv2 IQ4_XS 0.69 0.28
gpt-4-0125-preview Unknown nan nan 0.69 0.27
gpt-4o-mini-2024-07-18 Unknown nan nan 0.69 0.23
gpt-4o-2024-05-13 Unknown nan nan 0.68 0.31
llama-3.1-instruct 70 ggufv2 Q3_K_S 0.67 0.28
llama-3-instruct 8 ggufv2 Q8_0 0.65 0.35
llama-3-instruct 8 ggufv2 Q4_K_M 0.65 0.38
llama-3-instruct 8 ggufv2 Q6_K 0.65 0.36
openhermes-2.5 7 ggufv2 Q6_K 0.62 0.3
llama-3-instruct 8 ggufv2 Q5_K_M 0.62 0.36
openhermes-2.5 7 ggufv2 Q5_K_M 0.6 0.29
openhermes-2.5 7 ggufv2 Q8_0 0.6 0.3
openhermes-2.5 7 ggufv2 Q4_K_M 0.6 0.3
openhermes-2.5 7 ggufv2 Q3_K_M 0.56 0.3
code-llama-instruct 34 ggufv2 Q2_K 0.5 0.33
openhermes-2.5 7 ggufv2 Q2_K 0.5 0.28
code-llama-instruct 7 ggufv2 Q3_K_M 0.49 0.31
code-llama-instruct 7 ggufv2 Q4_K_M 0.47 0.39
mistral-instruct-v0.2 7 ggufv2 Q5_K_M 0.46 0.34
mistral-instruct-v0.2 7 ggufv2 Q6_K 0.45 0.34
code-llama-instruct 34 ggufv2 Q3_K_M 0.45 0.31
chatglm3 6 ggmlv3 q4_0 0.44 0.26
llama-2-chat 70 ggufv2 Q5_K_M 0.44 0.35
llama-2-chat 70 ggufv2 Q4_K_M 0.44 0.35
code-llama-instruct 13 ggufv2 Q6_K 0.44 0.35
code-llama-instruct 13 ggufv2 Q8_0 0.44 0.33
code-llama-instruct 13 ggufv2 Q5_K_M 0.43 0.32
llama-2-chat 70 ggufv2 Q3_K_M 0.41 0.33
mistral-instruct-v0.2 7 ggufv2 Q3_K_M 0.41 0.34
mistral-instruct-v0.2 7 ggufv2 Q8_0 0.4 0.33
llama-2-chat 13 ggufv2 Q8_0 0.4 0.34
code-llama-instruct 7 ggufv2 Q8_0 0.4 0.37
code-llama-instruct 7 ggufv2 Q5_K_M 0.39 0.34
llama-2-chat 13 ggufv2 Q3_K_M 0.39 0.33
llama-2-chat 13 ggufv2 Q5_K_M 0.39 0.33
code-llama-instruct 7 ggufv2 Q2_K 0.38 0.29
code-llama-instruct 34 ggufv2 Q4_K_M 0.38 0.35
code-llama-instruct 7 ggufv2 Q6_K 0.38 0.39
code-llama-instruct 34 ggufv2 Q5_K_M 0.38 0.38
llama-2-chat 70 ggufv2 Q2_K 0.38 0.35
llama-2-chat 13 ggufv2 Q6_K 0.37 0.34
code-llama-instruct 34 ggufv2 Q8_0 0.37 0.35
mistral-instruct-v0.2 7 ggufv2 Q2_K 0.37 0.29
mistral-instruct-v0.2 7 ggufv2 Q4_K_M 0.37 0.35
code-llama-instruct 34 ggufv2 Q6_K 0.37 0.36
llama-2-chat 13 ggufv2 Q4_K_M 0.36 0.34
mixtral-instruct-v0.1 46,7 ggufv2 Q4_K_M 0.35 0.3
mixtral-instruct-v0.1 46,7 ggufv2 Q5_K_M 0.34 0.31
mixtral-instruct-v0.1 46,7 ggufv2 Q6_K 0.34 0.29
llama-2-chat 7 ggufv2 Q5_K_M 0.34 0.29
mixtral-instruct-v0.1 46,7 ggufv2 Q3_K_M 0.33 0.28
llama-2-chat 7 ggufv2 Q2_K 0.33 0.32
code-llama-instruct 13 ggufv2 Q4_K_M 0.33 0.31
mixtral-instruct-v0.1 46,7 ggufv2 Q8_0 0.33 0.25
mixtral-instruct-v0.1 46,7 ggufv2 Q2_K 0.32 0.27
llama-2-chat 7 ggufv2 Q8_0 0.31 0.28
llama-2-chat 7 ggufv2 Q6_K 0.31 0.29
llama-2-chat 7 ggufv2 Q4_K_M 0.3 0.29
llama-2-chat 7 ggufv2 Q3_K_M 0.3 0.33
llama-2-chat 13 ggufv2 Q2_K 0.28 0.29
code-llama-instruct 13 ggufv2 Q2_K 0.17 0.34
code-llama-instruct 13 ggufv2 Q3_K_M 0.15 0.34

Boxplot Quantisation

Scores of all tasks

Wide table; you may need to scroll horizontally to see all columns. Table sorted by median score in descending order. Click the column names to reorder.

Full model name implicit_relevance_of_multiple_fragments medical_exam sourcedata_info_extraction explicit_relevance_of_single_fragments naive_query_generation_using_schema query_generation property_selection end_to_end_query_generation relationship_selection entity_selection multimodal_answer property_exists api_calling Mean Accuracy Median Accuracy SD
gpt-3.5-turbo-0125 0.9 0.671322 0.510032 1 0.514451 0.953757 0.36747 0.919075 1 0.978261 nan 0.789474 0.746479 0.779193 0.789474 0.201291
gpt-4o-2024-08-06 0.666667 0.850211 0.711185 1 0.528302 0.874214 0.425439 0.830189 1 1 nan 0.179894 nan 0.733282 0.781735 0.242634
claude-3-opus-20240229 1 0.805556 0.691235 0.833333 0.733333 0.944444 0.421875 0.655556 0 1 nan 1 nan 0.73503 0.770293 0.276411
claude-3-5-sonnet-20240620 1 0.7737 0.756088 1 0.633333 0.966667 0.375 0.733333 0 1 nan 0.866667 nan 0.736799 0.764894 0.283502
gpt-3.5-turbo-0613 1 nan 0.575381 1 0.5 0.946667 0.3625 0.833333 0.5 0.888889 nan 0.755556 nan 0.736233 0.755556 0.211926
llama-3.1-instruct:8:ggufv2:Q6_K 0.833333 0.751748 0.394469 1 0.633333 0.955556 0.46875 0.733333 0 1 nan 0.833333 nan 0.69126 0.742541 0.278015
llama-3.1-instruct:70:ggufv2:IQ2_M 1 0.772881 0.626498 1 0.633333 0.955556 0.328125 0.6 0 1 nan 0.916667 nan 0.712096 0.742489 0.293676
llama-3.1-instruct:8:ggufv2:Q5_K_M 0.833333 0.749117 0.380477 1 0.7 0.933333 0.4375 0.733333 0 1 nan 0.833333 nan 0.690948 0.741225 0.279278
gpt-4-0613 1 0.730912 0.668903 1 0.682081 0.959538 0.38253 0.878613 0.695652 0.920635 nan 0.666667 0.619048 0.767048 0.730912 0.172297
llama-3.1-instruct:70:ggufv2:IQ4_XS 1 0.822581 0.699238 1 0.566667 0.955556 0.375 0.6 0 1 nan 0.75 nan 0.706276 0.728138 0.285226
llama-3.1-instruct:8:ggufv2:Q8_0 1 0.765734 0.38907 1 0.660377 0.937107 0.565789 0.773585 0.142857 1 nan 0.161765 nan 0.67239 0.719062 0.295232
gpt-4-turbo-2024-04-09 1 0.83701 0.650369 1 0.508671 0.83237 0.325301 0.635838 0.130435 1 0.99 0.657143 nan 0.713928 0.713928 0.262809
llama-3.1-instruct:8:ggufv2:Q3_K_L 0.833333 0.768421 0.360379 1 0.622642 0.943396 0.486842 0.811321 0.142857 1 nan 0.24 nan 0.655381 0.711901 0.280463
llama-3.1-instruct:8:ggufv2:Q4_K_M 0.833333 0.741935 0.382027 1 0.660377 0.924528 0.513158 0.735849 0.285714 0.92 nan 0.150538 nan 0.649769 0.698113 0.257418
llama-3.1-instruct:8:ggufv2:IQ4_XS 0.833333 0.756944 0.414621 1 0.646018 0.946903 0.387255 0.743363 0 0.893939 nan 0.439394 nan 0.641979 0.69469 0.276449
gpt-4-0125-preview 0.5 0.777159 0.689705 1 0.456647 0.83815 0.0301205 0.109827 0.782609 0.824561 nan 0.619048 0.793651 0.618456 0.689705 0.27372
gpt-4o-mini-2024-07-18 0.5 0.840796 0.684553 0.833333 0.537572 0.953757 0.388554 0.687861 0.130435 0.921053 0.98 0.52439 0.714286 0.668969 0.686207 0.229321
gpt-4o-2024-05-13 0.7 0.763501 0.653946 1 0.537572 0.809249 0.0301205 0.115607 0.130435 1 0.96 0.526316 0.809524 0.618175 0.676973 0.312259
llama-3.1-instruct:70:ggufv2:Q3_K_S 1 0.8 0.642336 1 0.633333 0.966667 0.375 0.6 0 1 nan 0.625 nan 0.694758 0.668547 0.28438
llama-3-instruct:8:ggufv2:Q8_0 1 0.640669 0.188555 1 0.666667 0.92 0.28125 0 0 0.875 nan 0.725 nan 0.572467 0.653668 0.35454
llama-3-instruct:8:ggufv2:Q4_K_M 1 0.624884 0.116871 1 0.666667 0.92 0.109375 0 0 0.861111 nan 0.775 nan 0.552173 0.645775 0.376754
llama-3-instruct:8:ggufv2:Q6_K 1 0.623955 0.162657 1 0.666667 0.926667 0.28125 0 0 0.875 nan 0.775 nan 0.573745 0.645311 0.359165
openhermes-2.5:7:ggufv2:Q6_K 1 0.57423 0.619167 1 0.557078 0.890411 0.126404 0.273973 0.896552 0.7675 nan 0.113379 nan 0.619881 0.619524 0.300867
llama-3-instruct:8:ggufv2:Q5_K_M 1 0.635097 0.166434 1 0.6 0.926667 0.1875 0 0 0.875 nan 0.65 nan 0.549154 0.617549 0.360565
openhermes-2.5:7:ggufv2:Q5_K_M 1 0.571429 0.579916 1 0.56621 0.917808 0.196629 0.26484 0.896552 0.758808 nan 0.120988 nan 0.624835 0.602375 0.293641
openhermes-2.5:7:ggufv2:Q8_0 1 0.577031 0.600829 1 0.497717 0.90411 0.196629 0.237443 0.896552 0.628118 nan 0.100629 nan 0.603551 0.60219 0.296591
openhermes-2.5:7:ggufv2:Q4_K_M 1 0.586368 0.597281 1 0.479452 0.894977 0.126404 0.246575 0.896552 0.66967 nan 0.140921 nan 0.603473 0.600377 0.299216
openhermes-2.5:7:ggufv2:Q3_K_M 0.5 0.563959 0.554488 1 0.47032 0.917808 0.171348 0.287671 1 1 nan 0.156805 nan 0.602036 0.559223 0.301462
code-llama-instruct:34:ggufv2:Q2_K 1 nan nan 0.5 0.566667 0.686667 0 0 0.5 0 nan 0.75 nan 0.444815 0.5 0.328199
openhermes-2.5:7:ggufv2:Q2_K 0.5 0.539106 0.444054 1 0.420091 0.917808 0.0168539 0.159817 0.655172 0.604444 nan 0.18638 nan 0.494884 0.497442 0.27656
code-llama-instruct:7:ggufv2:Q3_K_M 0.7 nan nan 0.833333 0.426667 0.873333 0 0 0.25 0.5 nan 0.8 nan 0.487037 0.493519 0.307716
code-llama-instruct:7:ggufv2:Q4_K_M 1 nan 0.138732 1 0.653333 0.966667 0 0 0 0.333333 nan 0.6 nan 0.469207 0.469207 0.38731
mistral-instruct-v0.2:7:ggufv2:Q5_K_M 1 0.364146 0.385754 1 0.466667 0.826667 0 0 0 0.444444 nan 0.688889 nan 0.470597 0.455556 0.34385
mistral-instruct-v0.2:7:ggufv2:Q6_K 1 0.366947 0.367412 1 0.433333 0.833333 0.046875 0 0 0.5 nan 0.65 nan 0.472536 0.452935 0.337974
code-llama-instruct:34:ggufv2:Q3_K_M 0.5 nan nan 0.5 0.6 0.786667 0 0 0.25 0 nan 0.875 nan 0.390185 0.445093 0.306514
chatglm3:6:ggmlv3:q4_0 1 0.426704 0.188284 0.733333 0.48 0.553333 0.2875 0 0.4 0.75 nan 0.275 nan 0.463105 0.444905 0.260423
llama-2-chat:70:ggufv2:Q5_K_M 0.9 nan 0.210166 1 0.36 0.906667 0 0 0.25 0.444444 nan 0.777778 nan 0.484905 0.444444 0.346535
llama-2-chat:70:ggufv2:Q4_K_M 1 nan 0.240936 1 0.42 0.92 0 0 0.25 0.444444 nan 0.755556 nan 0.503094 0.444444 0.354692
code-llama-instruct:13:ggufv2:Q6_K 0.5 nan nan 0.833333 0.54 0.793333 0 0 0 0 nan 0.825 nan 0.387963 0.443981 0.345581
code-llama-instruct:13:ggufv2:Q8_0 0.5 nan nan 0.833333 0.566667 0.766667 0 0 0 0 nan 0.75 nan 0.37963 0.439815 0.334971
code-llama-instruct:13:ggufv2:Q5_K_M 0.5 nan nan 0.666667 0.566667 0.78 0 0 0 0 nan 0.775 nan 0.36537 0.432685 0.320506
llama-2-chat:70:ggufv2:Q3_K_M 0.5 nan 0.197898 1 0.413333 0.906667 0.171875 0 0 0.333333 nan 0.777778 nan 0.430088 0.413333 0.327267
mistral-instruct-v0.2:7:ggufv2:Q3_K_M 1 0.360411 0.368974 1 0.466667 0.773333 0.046875 0 0 0.333333 nan 0.666667 nan 0.456024 0.412499 0.335885
mistral-instruct-v0.2:7:ggufv2:Q8_0 0.9 0.366947 0.351684 1 0.433333 0.846667 0.0375 0 0 0.333333 nan 0.644444 nan 0.446719 0.40014 0.330107
llama-2-chat:13:ggufv2:Q8_0 0.5 0.431373 0.0762457 1 0.48 0.786667 0 0 0 0 nan 0.711111 nan 0.362309 0.396841 0.335904
code-llama-instruct:7:ggufv2:Q8_0 0.5 nan nan 1 0.4 0.96 0 0 0 0 nan 0.666667 nan 0.391852 0.395926 0.37338
code-llama-instruct:7:ggufv2:Q5_K_M 0.5 nan nan 0.833333 0.4 0.96 0 0 0 0.111111 nan 0.688889 nan 0.388148 0.394074 0.340156
llama-2-chat:13:ggufv2:Q3_K_M 0.5 0.428571 0.112631 1 0.48 0.68 0 0 0 0 nan 0.733333 nan 0.357685 0.393128 0.325419
llama-2-chat:13:ggufv2:Q5_K_M 0.5 0.431373 0.0766167 1 0.433333 0.746667 0 0 0 0 nan 0.644444 nan 0.348403 0.389888 0.32518
code-llama-instruct:7:ggufv2:Q2_K 0.7 nan nan 0.333333 0.533333 0.92 0.0625 0 0.25 0.25 nan 0.8 nan 0.427685 0.380509 0.292686
code-llama-instruct:34:ggufv2:Q4_K_M 0.4 nan nan 0.5 0.466667 0.906667 0 0 0 0 nan 0.975 nan 0.360926 0.380463 0.350483
code-llama-instruct:7:ggufv2:Q6_K 0.9 nan nan 0.833333 0.333333 0.96 0 0 0 0 nan 0.775 nan 0.422407 0.37787 0.391629
code-llama-instruct:34:ggufv2:Q5_K_M 1 nan nan 0.333333 0.466667 0.9 0 0 0 0.125 nan 0.95 nan 0.419444 0.376389 0.384096
llama-2-chat:70:ggufv2:Q2_K 0.5 nan 0.215047 1 0.473333 0.9 0 0 0 0 nan 0.666667 nan 0.375505 0.375505 0.352226
llama-2-chat:13:ggufv2:Q6_K 0.5 0.428571 0.0781337 1 0.386667 0.813333 0 0 0 0 nan 0.775 nan 0.361973 0.37432 0.342819
code-llama-instruct:34:ggufv2:Q8_0 0.9 nan nan 0.333333 0.466667 0.86 0 0 0 0.25 nan 0.925 nan 0.415 0.374167 0.353285
mistral-instruct-v0.2:7:ggufv2:Q2_K 0.5 0.352941 0.331261 1 0.573333 0.693333 0 0 0 0.222222 nan 0.6 nan 0.388463 0.370702 0.294881
mistral-instruct-v0.2:7:ggufv2:Q4_K_M 1 0.365079 0.347025 1 0.366667 0.826667 0 0 0 0.333333 nan 0.688889 nan 0.447969 0.365873 0.348328
code-llama-instruct:34:ggufv2:Q6_K 0.9 nan nan 0.333333 0.473333 0.853333 0 0 0 0.125 nan 0.9 nan 0.398333 0.365833 0.356636
llama-2-chat:13:ggufv2:Q4_K_M 0.5 0.428571 0.0888675 1 0.366667 0.76 0 0 0 0 nan 0.777778 nan 0.356535 0.361601 0.336686
mixtral-instruct-v0.1:46_7:ggufv2:Q4_K_M 1 0.368814 0.193786 0.166667 0.426667 0.76 0.1625 0 0 0.333333 nan 0.755556 nan 0.378848 0.351074 0.301567
mixtral-instruct-v0.1:46_7:ggufv2:Q5_K_M 1 0.352941 0.235659 0 0.333333 0.84 0 0 0.25 0.422222 nan 0.711111 nan 0.376842 0.343137 0.313874
mixtral-instruct-v0.1:46_7:ggufv2:Q6_K 0.7 0.34267 0.225524 0 0.333333 0.826667 0 0 0.25 0.475 nan 0.85 nan 0.363927 0.338002 0.289705
llama-2-chat:7:ggufv2:Q5_K_M 0.6 0.40056 0.0697591 1 0.3379 0.547945 0.0337079 0 0 0.57037 nan 0.155556 nan 0.3378 0.33785 0.293711
mixtral-instruct-v0.1:46_7:ggufv2:Q3_K_M 0.5 nan 0.229622 0 0.38 0.893333 0.065625 0 0.25 0.333333 nan 0.777778 nan 0.342969 0.333333 0.278279
llama-2-chat:7:ggufv2:Q2_K 1 0.369748 0.0361865 0.833333 0.164384 0.611872 0 0 0 0.410256 nan 0.324786 nan 0.340961 0.332873 0.320018
code-llama-instruct:13:ggufv2:Q4_K_M 0.5 nan nan 0.333333 0.533333 0.833333 0 0 0 0 nan 0.775 nan 0.330556 0.331944 0.30939
mixtral-instruct-v0.1:46_7:ggufv2:Q8_0 0.6 0.358543 0.189177 0.133333 0.386667 0.846667 0 0 0.25 0.311111 nan 0.666667 nan 0.340197 0.325654 0.248216
mixtral-instruct-v0.1:46_7:ggufv2:Q2_K 0.6 0.329599 0.157514 0.333333 0.48 0.726667 0 0 0 0 nan 0.733333 nan 0.305495 0.317547 0.269925
llama-2-chat:7:ggufv2:Q8_0 0.5 0.40056 0.0847297 1 0.292237 0.589041 0 0 0.103448 0.481481 nan 0.163399 nan 0.328627 0.310432 0.278624
llama-2-chat:7:ggufv2:Q6_K 0.5 0.406162 0.0614608 1 0.292237 0.561644 0 0 0 0.553846 nan 0.177778 nan 0.323012 0.307625 0.290181
llama-2-chat:7:ggufv2:Q4_K_M 0.5 0.40056 0.0852494 1 0.273973 0.611872 0 0 0 0.57037 nan 0.251852 nan 0.335807 0.30489 0.291028
llama-2-chat:7:ggufv2:Q3_K_M 1 0.394958 0.0650717 1 0.228311 0.589041 0.0898876 0 0.103448 0.435897 nan 0.207407 nan 0.374002 0.301156 0.326263
llama-2-chat:13:ggufv2:Q2_K 0.5 0.414566 0.0649389 1 0.366667 0.433333 0 0 0 0 nan 0.288889 nan 0.278945 0.283917 0.285171
code-llama-instruct:13:ggufv2:Q2_K 0.4 nan nan 0.0333333 0.566667 0.82 0 0 0 0 nan 0.875 nan 0.299444 0.166389 0.336056
code-llama-instruct:13:ggufv2:Q3_K_M 0 nan nan 0 0.533333 0.833333 0 0 0 0.45 nan 0.85 nan 0.296296 0.148148 0.336707