Benchmark Results - Overview
Here we collect the results of the living BioChatter benchmark. For an explanation, see the benchmarking documentation and the developer docs for further reading.
Scores per model
Table sorted by median score in descending order. Click the column names to reorder.
Model name | Size | Median Accuracy | SD |
---|---|---|---|
gpt-3.5-turbo-0125 | 175 | 0.87 | 0.21 |
gpt-4-turbo-2024-04-09 | Unknown | 0.83 | 0.3 |
gpt-4-0613 | Unknown | 0.78 | 0.18 |
claude-3-opus-20240229 | Unknown | 0.77 | 0.28 |
gpt-3.5-turbo-0613 | 175 | 0.76 | 0.21 |
claude-3-5-sonnet-20240620 | Unknown | 0.76 | 0.28 |
llama-3.1-instruct | 8 | 0.74 | 0.28 |
gpt-4-0125-preview | Unknown | 0.73 | 0.3 |
llama-3.1-instruct | 70 | 0.73 | 0.29 |
gpt-4o-2024-05-13 | Unknown | 0.73 | 0.35 |
openhermes-2.5 | 7 | 0.7 | 0.32 |
gpt-4o-mini-2024-07-18 | Unknown | 0.7 | 0.27 |
llama-3-instruct | 8 | 0.65 | 0.36 |
chatglm3 | 6 | 0.44 | 0.26 |
llama-2-chat | 70 | 0.42 | 0.34 |
mistral-instruct-v0.2 | 7 | 0.4 | 0.33 |
code-llama-instruct | 7 | 0.4 | 0.35 |
code-llama-instruct | 13 | 0.38 | 0.33 |
code-llama-instruct | 34 | 0.38 | 0.35 |
llama-2-chat | 13 | 0.38 | 0.33 |
llama-2-chat | 7 | 0.34 | 0.31 |
mixtral-instruct-v0.1 | 46,7 | 0.34 | 0.28 |
Scores per quantisation
Table sorted by median score in descending order. Click the column names to reorder.
Model name | Size | Version | Quantisation | Median Accuracy | SD |
---|---|---|---|---|---|
gpt-3.5-turbo-0125 | 175 | nan | nan | 0.87 | 0.21 |
gpt-4-turbo-2024-04-09 | Unknown | nan | nan | 0.83 | 0.3 |
gpt-4-0613 | Unknown | nan | nan | 0.78 | 0.18 |
claude-3-opus-20240229 | Unknown | nan | nan | 0.77 | 0.28 |
claude-3-5-sonnet-20240620 | Unknown | nan | nan | 0.76 | 0.28 |
gpt-3.5-turbo-0613 | 175 | nan | nan | 0.76 | 0.21 |
llama-3.1-instruct | 8 | ggufv2 | Q3_K_L | 0.75 | 0.29 |
llama-3.1-instruct | 8 | ggufv2 | Q8_0 | 0.75 | 0.28 |
llama-3.1-instruct | 8 | ggufv2 | IQ4_XS | 0.75 | 0.3 |
llama-3.1-instruct | 8 | ggufv2 | Q6_K | 0.74 | 0.28 |
llama-3.1-instruct | 70 | ggufv2 | IQ2_M | 0.74 | 0.29 |
llama-3.1-instruct | 8 | ggufv2 | Q5_K_M | 0.74 | 0.28 |
llama-3.1-instruct | 8 | ggufv2 | Q4_K_M | 0.74 | 0.28 |
gpt-4-0125-preview | Unknown | nan | nan | 0.73 | 0.3 |
gpt-4o-2024-05-13 | Unknown | nan | nan | 0.73 | 0.35 |
llama-3.1-instruct | 70 | ggufv2 | IQ4_XS | 0.73 | 0.29 |
openhermes-2.5 | 7 | ggufv2 | Q5_K_M | 0.73 | 0.32 |
openhermes-2.5 | 7 | ggufv2 | Q8_0 | 0.71 | 0.32 |
openhermes-2.5 | 7 | ggufv2 | Q4_K_M | 0.71 | 0.33 |
gpt-4o-mini-2024-07-18 | Unknown | nan | nan | 0.7 | 0.27 |
openhermes-2.5 | 7 | ggufv2 | Q6_K | 0.7 | 0.33 |
llama-3.1-instruct | 70 | ggufv2 | Q3_K_S | 0.67 | 0.28 |
llama-3-instruct | 8 | ggufv2 | Q8_0 | 0.65 | 0.35 |
llama-3-instruct | 8 | ggufv2 | Q4_K_M | 0.65 | 0.38 |
llama-3-instruct | 8 | ggufv2 | Q6_K | 0.65 | 0.36 |
llama-3-instruct | 8 | ggufv2 | Q5_K_M | 0.62 | 0.36 |
openhermes-2.5 | 7 | ggufv2 | Q3_K_M | 0.59 | 0.32 |
openhermes-2.5 | 7 | ggufv2 | Q2_K | 0.51 | 0.3 |
code-llama-instruct | 34 | ggufv2 | Q2_K | 0.5 | 0.33 |
code-llama-instruct | 7 | ggufv2 | Q3_K_M | 0.49 | 0.31 |
code-llama-instruct | 7 | ggufv2 | Q4_K_M | 0.47 | 0.39 |
mistral-instruct-v0.2 | 7 | ggufv2 | Q5_K_M | 0.46 | 0.34 |
mistral-instruct-v0.2 | 7 | ggufv2 | Q6_K | 0.45 | 0.34 |
code-llama-instruct | 34 | ggufv2 | Q3_K_M | 0.45 | 0.31 |
chatglm3 | 6 | ggmlv3 | q4_0 | 0.44 | 0.26 |
llama-2-chat | 70 | ggufv2 | Q4_K_M | 0.44 | 0.35 |
llama-2-chat | 70 | ggufv2 | Q5_K_M | 0.44 | 0.35 |
code-llama-instruct | 13 | ggufv2 | Q6_K | 0.44 | 0.35 |
code-llama-instruct | 13 | ggufv2 | Q8_0 | 0.44 | 0.33 |
code-llama-instruct | 13 | ggufv2 | Q5_K_M | 0.43 | 0.32 |
llama-2-chat | 70 | ggufv2 | Q3_K_M | 0.41 | 0.33 |
mistral-instruct-v0.2 | 7 | ggufv2 | Q3_K_M | 0.41 | 0.34 |
mistral-instruct-v0.2 | 7 | ggufv2 | Q8_0 | 0.4 | 0.33 |
llama-2-chat | 13 | ggufv2 | Q8_0 | 0.4 | 0.34 |
code-llama-instruct | 7 | ggufv2 | Q8_0 | 0.4 | 0.37 |
code-llama-instruct | 7 | ggufv2 | Q5_K_M | 0.39 | 0.34 |
llama-2-chat | 13 | ggufv2 | Q3_K_M | 0.39 | 0.33 |
llama-2-chat | 13 | ggufv2 | Q5_K_M | 0.39 | 0.33 |
code-llama-instruct | 7 | ggufv2 | Q2_K | 0.38 | 0.29 |
code-llama-instruct | 34 | ggufv2 | Q4_K_M | 0.38 | 0.35 |
code-llama-instruct | 7 | ggufv2 | Q6_K | 0.38 | 0.39 |
code-llama-instruct | 34 | ggufv2 | Q5_K_M | 0.38 | 0.38 |
llama-2-chat | 70 | ggufv2 | Q2_K | 0.38 | 0.35 |
llama-2-chat | 13 | ggufv2 | Q6_K | 0.37 | 0.34 |
code-llama-instruct | 34 | ggufv2 | Q8_0 | 0.37 | 0.35 |
llama-2-chat | 7 | ggufv2 | Q4_K_M | 0.37 | 0.29 |
mistral-instruct-v0.2 | 7 | ggufv2 | Q2_K | 0.37 | 0.29 |
mistral-instruct-v0.2 | 7 | ggufv2 | Q4_K_M | 0.37 | 0.35 |
code-llama-instruct | 34 | ggufv2 | Q6_K | 0.37 | 0.36 |
llama-2-chat | 13 | ggufv2 | Q4_K_M | 0.36 | 0.34 |
llama-2-chat | 7 | ggufv2 | Q3_K_M | 0.36 | 0.34 |
mixtral-instruct-v0.1 | 46,7 | ggufv2 | Q4_K_M | 0.35 | 0.3 |
llama-2-chat | 7 | ggufv2 | Q8_0 | 0.35 | 0.29 |
mixtral-instruct-v0.1 | 46,7 | ggufv2 | Q5_K_M | 0.34 | 0.31 |
mixtral-instruct-v0.1 | 46,7 | ggufv2 | Q6_K | 0.34 | 0.29 |
mixtral-instruct-v0.1 | 46,7 | ggufv2 | Q3_K_M | 0.33 | 0.28 |
code-llama-instruct | 13 | ggufv2 | Q4_K_M | 0.33 | 0.31 |
llama-2-chat | 7 | ggufv2 | Q6_K | 0.33 | 0.29 |
mixtral-instruct-v0.1 | 46,7 | ggufv2 | Q8_0 | 0.33 | 0.25 |
llama-2-chat | 7 | ggufv2 | Q5_K_M | 0.32 | 0.29 |
mixtral-instruct-v0.1 | 46,7 | ggufv2 | Q2_K | 0.32 | 0.27 |
llama-2-chat | 13 | ggufv2 | Q2_K | 0.28 | 0.29 |
llama-2-chat | 7 | ggufv2 | Q2_K | 0.22 | 0.36 |
code-llama-instruct | 13 | ggufv2 | Q2_K | 0.17 | 0.34 |
code-llama-instruct | 13 | ggufv2 | Q3_K_M | 0.15 | 0.34 |
Scores of all tasks
Wide table; you may need to scroll horizontally to see all columns. Table sorted by median score in descending order. Click the column names to reorder.
Full model name | property_selection | property_exists | sourcedata_info_extraction | explicit_relevance_of_single_fragments | medical_exam | naive_query_generation_using_schema | query_generation | relationship_selection | api_calling | implicit_relevance_of_multiple_fragments | multimodal_answer | entity_selection | end_to_end_query_generation | Mean Accuracy | Median Accuracy | SD |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
gpt-3.5-turbo-0125 | 0.35625 | 0.866667 | 0.510032 | 1 | 0.670401 | 0.486667 | 0.966667 | 1 | 0.746479 | 0.9 | nan | 1 | 0.926667 | 0.785819 | 0.866667 | 0.211361 |
gpt-4-turbo-2024-04-09 | 0.303125 | 1 | 0.650369 | 1 | 0.839506 | 0.5 | 0.826667 | 0 | nan | 1 | 0.99 | 1 | 0.6 | 0.725806 | 0.826667 | 0.301411 |
gpt-4-0613 | 0.359375 | 0.888889 | 0.668903 | 1 | 0.730159 | 0.68 | 0.966667 | 0.65 | 0.619048 | 1 | nan | 0.888889 | 0.88 | 0.777661 | 0.777661 | 0.177558 |
claude-3-opus-20240229 | 0.421875 | 1 | 0.691235 | 0.833333 | 0.805556 | 0.733333 | 0.944444 | 0 | nan | 1 | nan | 1 | 0.655556 | 0.73503 | 0.770293 | 0.276411 |
claude-3-5-sonnet-20240620 | 0.375 | 0.866667 | 0.756088 | 1 | 0.7737 | 0.633333 | 0.966667 | 0 | nan | 1 | nan | 1 | 0.733333 | 0.736799 | 0.764894 | 0.283502 |
gpt-3.5-turbo-0613 | 0.3625 | 0.755556 | 0.575381 | 1 | nan | 0.5 | 0.946667 | 0.5 | nan | 1 | nan | 0.888889 | 0.833333 | 0.736233 | 0.755556 | 0.211926 |
llama-3.1-instruct:8:ggufv2:Q3_K_L | 0.421875 | 0.833333 | 0.360379 | 1 | 0.768421 | 0.566667 | 0.933333 | 0 | nan | 0.833333 | nan | 1 | 0.733333 | 0.677334 | 0.750877 | 0.285133 |
llama-3.1-instruct:8:ggufv2:Q8_0 | 0.515625 | 0.833333 | 0.38907 | 1 | 0.765734 | 0.633333 | 0.933333 | 0 | nan | 1 | nan | 1 | 0.733333 | 0.709433 | 0.749534 | 0.284792 |
llama-3.1-instruct:8:ggufv2:IQ4_XS | 0.359375 | 1 | 0.414621 | 1 | 0.756944 | 0.633333 | 0.944444 | 0 | nan | 0.833333 | nan | 1 | 0.733333 | 0.697762 | 0.745139 | 0.295214 |
llama-3.1-instruct:8:ggufv2:Q6_K | 0.46875 | 0.833333 | 0.394469 | 1 | 0.751748 | 0.633333 | 0.955556 | 0 | nan | 0.833333 | nan | 1 | 0.733333 | 0.69126 | 0.742541 | 0.278015 |
llama-3.1-instruct:70:ggufv2:IQ2_M | 0.328125 | 0.916667 | 0.626498 | 1 | 0.772881 | 0.633333 | 0.955556 | 0 | nan | 1 | nan | 1 | 0.6 | 0.712096 | 0.742489 | 0.293676 |
llama-3.1-instruct:8:ggufv2:Q5_K_M | 0.4375 | 0.833333 | 0.380477 | 1 | 0.749117 | 0.7 | 0.933333 | 0 | nan | 0.833333 | nan | 1 | 0.733333 | 0.690948 | 0.741225 | 0.279278 |
llama-3.1-instruct:8:ggufv2:Q4_K_M | 0.453125 | 0.75 | 0.382027 | 1 | 0.741935 | 0.633333 | 0.933333 | 0 | nan | 0.833333 | nan | 1 | 0.733333 | 0.67822 | 0.737634 | 0.275701 |
gpt-4-0125-preview | 0 | 0.733333 | 0.689705 | 1 | 0.77591 | 0.44 | 0.833333 | 0.75 | 0.793651 | 0.5 | nan | 0.777778 | 0 | 0.607809 | 0.733333 | 0.295129 |
gpt-4o-2024-05-13 | 0 | 0.85 | 0.653946 | 1 | 0.762838 | 0.533333 | 0.8 | 0 | 0.809524 | 0.7 | 0.96 | 1 | 0 | 0.620742 | 0.731419 | 0.351091 |
llama-3.1-instruct:70:ggufv2:IQ4_XS | 0.375 | 0.75 | 0.699238 | 1 | 0.822581 | 0.566667 | 0.955556 | 0 | nan | 1 | nan | 1 | 0.6 | 0.706276 | 0.728138 | 0.285226 |
openhermes-2.5:7:ggufv2:Q5_K_M | 0.125 | 0.777778 | 0.579916 | 1 | 0.571429 | 0.586667 | 0.913333 | 1 | nan | 1 | nan | 0.888889 | 0 | 0.676637 | 0.727208 | 0.318593 |
openhermes-2.5:7:ggufv2:Q8_0 | 0.125 | 0.755556 | 0.600829 | 1 | 0.577031 | 0.466667 | 0.88 | 1 | nan | 1 | nan | 0.888889 | 0 | 0.663088 | 0.709322 | 0.319919 |
openhermes-2.5:7:ggufv2:Q4_K_M | 0.046875 | 0.755556 | 0.597281 | 1 | 0.586368 | 0.466667 | 0.873333 | 1 | nan | 1 | nan | 0.888889 | 0 | 0.655906 | 0.705731 | 0.330932 |
gpt-4o-mini-2024-07-18 | 0.365625 | 0.925 | 0.684553 | 0.833333 | 0.840498 | 0.533333 | 0.966667 | 0 | 0.714286 | 0.5 | 0.98 | 1 | 0.66 | 0.692561 | 0.703423 | 0.267084 |
openhermes-2.5:7:ggufv2:Q6_K | 0.046875 | 0.733333 | 0.619167 | 1 | 0.57423 | 0.533333 | 0.86 | 1 | nan | 1 | nan | 1 | 0 | 0.669722 | 0.701528 | 0.334697 |
llama-3.1-instruct:70:ggufv2:Q3_K_S | 0.375 | 0.625 | 0.642336 | 1 | 0.8 | 0.633333 | 0.966667 | 0 | nan | 1 | nan | 1 | 0.6 | 0.694758 | 0.668547 | 0.28438 |
llama-3-instruct:8:ggufv2:Q8_0 | 0.28125 | 0.725 | 0.188555 | 1 | 0.640669 | 0.666667 | 0.92 | 0 | nan | 1 | nan | 0.875 | 0 | 0.572467 | 0.653668 | 0.35454 |
llama-3-instruct:8:ggufv2:Q4_K_M | 0.109375 | 0.775 | 0.116871 | 1 | 0.624884 | 0.666667 | 0.92 | 0 | nan | 1 | nan | 0.861111 | 0 | 0.552173 | 0.645775 | 0.376754 |
llama-3-instruct:8:ggufv2:Q6_K | 0.28125 | 0.775 | 0.162657 | 1 | 0.623955 | 0.666667 | 0.926667 | 0 | nan | 1 | nan | 0.875 | 0 | 0.573745 | 0.645311 | 0.359165 |
llama-3-instruct:8:ggufv2:Q5_K_M | 0.1875 | 0.65 | 0.166434 | 1 | 0.635097 | 0.6 | 0.926667 | 0 | nan | 1 | nan | 0.875 | 0 | 0.549154 | 0.617549 | 0.360565 |
openhermes-2.5:7:ggufv2:Q3_K_M | 0.125 | 0.72 | 0.554488 | 1 | 0.563959 | 0.466667 | 0.94 | 1 | nan | 0.5 | nan | 1 | 0 | 0.624556 | 0.594257 | 0.318981 |
openhermes-2.5:7:ggufv2:Q2_K | 0 | 0.844444 | 0.444054 | 1 | 0.537815 | 0.433333 | 0.94 | 0.5 | nan | 0.5 | nan | 0.555556 | 0 | 0.5232 | 0.5116 | 0.298404 |
code-llama-instruct:34:ggufv2:Q2_K | 0 | 0.75 | nan | 0.5 | nan | 0.566667 | 0.686667 | 0.5 | nan | 1 | nan | 0 | 0 | 0.444815 | 0.5 | 0.328199 |
code-llama-instruct:7:ggufv2:Q3_K_M | 0 | 0.8 | nan | 0.833333 | nan | 0.426667 | 0.873333 | 0.25 | nan | 0.7 | nan | 0.5 | 0 | 0.487037 | 0.493519 | 0.307716 |
code-llama-instruct:7:ggufv2:Q4_K_M | 0 | 0.6 | 0.138732 | 1 | nan | 0.653333 | 0.966667 | 0 | nan | 1 | nan | 0.333333 | 0 | 0.469207 | 0.469207 | 0.38731 |
mistral-instruct-v0.2:7:ggufv2:Q5_K_M | 0 | 0.688889 | 0.385754 | 1 | 0.364146 | 0.466667 | 0.826667 | 0 | nan | 1 | nan | 0.444444 | 0 | 0.470597 | 0.455556 | 0.34385 |
mistral-instruct-v0.2:7:ggufv2:Q6_K | 0.046875 | 0.65 | 0.367412 | 1 | 0.366947 | 0.433333 | 0.833333 | 0 | nan | 1 | nan | 0.5 | 0 | 0.472536 | 0.452935 | 0.337974 |
code-llama-instruct:34:ggufv2:Q3_K_M | 0 | 0.875 | nan | 0.5 | nan | 0.6 | 0.786667 | 0.25 | nan | 0.5 | nan | 0 | 0 | 0.390185 | 0.445093 | 0.306514 |
chatglm3:6:ggmlv3:q4_0 | 0.2875 | 0.275 | 0.188284 | 0.733333 | 0.426704 | 0.48 | 0.553333 | 0.4 | nan | 1 | nan | 0.75 | 0 | 0.463105 | 0.444905 | 0.260423 |
llama-2-chat:70:ggufv2:Q4_K_M | 0 | 0.755556 | 0.240936 | 1 | nan | 0.42 | 0.92 | 0.25 | nan | 1 | nan | 0.444444 | 0 | 0.503094 | 0.444444 | 0.354692 |
llama-2-chat:70:ggufv2:Q5_K_M | 0 | 0.777778 | 0.210166 | 1 | nan | 0.36 | 0.906667 | 0.25 | nan | 0.9 | nan | 0.444444 | 0 | 0.484905 | 0.444444 | 0.346535 |
code-llama-instruct:13:ggufv2:Q6_K | 0 | 0.825 | nan | 0.833333 | nan | 0.54 | 0.793333 | 0 | nan | 0.5 | nan | 0 | 0 | 0.387963 | 0.443981 | 0.345581 |
code-llama-instruct:13:ggufv2:Q8_0 | 0 | 0.75 | nan | 0.833333 | nan | 0.566667 | 0.766667 | 0 | nan | 0.5 | nan | 0 | 0 | 0.37963 | 0.439815 | 0.334971 |
code-llama-instruct:13:ggufv2:Q5_K_M | 0 | 0.775 | nan | 0.666667 | nan | 0.566667 | 0.78 | 0 | nan | 0.5 | nan | 0 | 0 | 0.36537 | 0.432685 | 0.320506 |
llama-2-chat:70:ggufv2:Q3_K_M | 0.171875 | 0.777778 | 0.197898 | 1 | nan | 0.413333 | 0.906667 | 0 | nan | 0.5 | nan | 0.333333 | 0 | 0.430088 | 0.413333 | 0.327267 |
mistral-instruct-v0.2:7:ggufv2:Q3_K_M | 0.046875 | 0.666667 | 0.368974 | 1 | 0.360411 | 0.466667 | 0.773333 | 0 | nan | 1 | nan | 0.333333 | 0 | 0.456024 | 0.412499 | 0.335885 |
mistral-instruct-v0.2:7:ggufv2:Q8_0 | 0.0375 | 0.644444 | 0.351684 | 1 | 0.366947 | 0.433333 | 0.846667 | 0 | nan | 0.9 | nan | 0.333333 | 0 | 0.446719 | 0.40014 | 0.330107 |
llama-2-chat:13:ggufv2:Q8_0 | 0 | 0.711111 | 0.0762457 | 1 | 0.431373 | 0.48 | 0.786667 | 0 | nan | 0.5 | nan | 0 | 0 | 0.362309 | 0.396841 | 0.335904 |
code-llama-instruct:7:ggufv2:Q8_0 | 0 | 0.666667 | nan | 1 | nan | 0.4 | 0.96 | 0 | nan | 0.5 | nan | 0 | 0 | 0.391852 | 0.395926 | 0.37338 |
code-llama-instruct:7:ggufv2:Q5_K_M | 0 | 0.688889 | nan | 0.833333 | nan | 0.4 | 0.96 | 0 | nan | 0.5 | nan | 0.111111 | 0 | 0.388148 | 0.394074 | 0.340156 |
llama-2-chat:13:ggufv2:Q3_K_M | 0 | 0.733333 | 0.112631 | 1 | 0.428571 | 0.48 | 0.68 | 0 | nan | 0.5 | nan | 0 | 0 | 0.357685 | 0.393128 | 0.325419 |
llama-2-chat:13:ggufv2:Q5_K_M | 0 | 0.644444 | 0.0766167 | 1 | 0.431373 | 0.433333 | 0.746667 | 0 | nan | 0.5 | nan | 0 | 0 | 0.348403 | 0.389888 | 0.32518 |
code-llama-instruct:7:ggufv2:Q2_K | 0.0625 | 0.8 | nan | 0.333333 | nan | 0.533333 | 0.92 | 0.25 | nan | 0.7 | nan | 0.25 | 0 | 0.427685 | 0.380509 | 0.292686 |
code-llama-instruct:34:ggufv2:Q4_K_M | 0 | 0.975 | nan | 0.5 | nan | 0.466667 | 0.906667 | 0 | nan | 0.4 | nan | 0 | 0 | 0.360926 | 0.380463 | 0.350483 |
code-llama-instruct:7:ggufv2:Q6_K | 0 | 0.775 | nan | 0.833333 | nan | 0.333333 | 0.96 | 0 | nan | 0.9 | nan | 0 | 0 | 0.422407 | 0.37787 | 0.391629 |
code-llama-instruct:34:ggufv2:Q5_K_M | 0 | 0.95 | nan | 0.333333 | nan | 0.466667 | 0.9 | 0 | nan | 1 | nan | 0.125 | 0 | 0.419444 | 0.376389 | 0.384096 |
llama-2-chat:70:ggufv2:Q2_K | 0 | 0.666667 | 0.215047 | 1 | nan | 0.473333 | 0.9 | 0 | nan | 0.5 | nan | 0 | 0 | 0.375505 | 0.375505 | 0.352226 |
llama-2-chat:13:ggufv2:Q6_K | 0 | 0.775 | 0.0781337 | 1 | 0.428571 | 0.386667 | 0.813333 | 0 | nan | 0.5 | nan | 0 | 0 | 0.361973 | 0.37432 | 0.342819 |
code-llama-instruct:34:ggufv2:Q8_0 | 0 | 0.925 | nan | 0.333333 | nan | 0.466667 | 0.86 | 0 | nan | 0.9 | nan | 0.25 | 0 | 0.415 | 0.374167 | 0.353285 |
llama-2-chat:7:ggufv2:Q4_K_M | 0 | 0.488889 | 0.0852494 | 1 | 0.40056 | 0.24 | 0.646667 | 0 | nan | 0.5 | nan | 0.444444 | 0 | 0.345983 | 0.373271 | 0.290686 |
mistral-instruct-v0.2:7:ggufv2:Q2_K | 0 | 0.6 | 0.331261 | 1 | 0.352941 | 0.573333 | 0.693333 | 0 | nan | 0.5 | nan | 0.222222 | 0 | 0.388463 | 0.370702 | 0.294881 |
mistral-instruct-v0.2:7:ggufv2:Q4_K_M | 0 | 0.688889 | 0.347025 | 1 | 0.365079 | 0.366667 | 0.826667 | 0 | nan | 1 | nan | 0.333333 | 0 | 0.447969 | 0.365873 | 0.348328 |
code-llama-instruct:34:ggufv2:Q6_K | 0 | 0.9 | nan | 0.333333 | nan | 0.473333 | 0.853333 | 0 | nan | 0.9 | nan | 0.125 | 0 | 0.398333 | 0.365833 | 0.356636 |
llama-2-chat:13:ggufv2:Q4_K_M | 0 | 0.777778 | 0.0888675 | 1 | 0.428571 | 0.366667 | 0.76 | 0 | nan | 0.5 | nan | 0 | 0 | 0.356535 | 0.361601 | 0.336686 |
llama-2-chat:7:ggufv2:Q3_K_M | 0.1 | 0.466667 | 0.0650717 | 1 | 0.394958 | 0.233333 | 0.693333 | 0 | nan | 1 | nan | 0.333333 | 0 | 0.3897 | 0.361517 | 0.337204 |
mixtral-instruct-v0.1:46_7:ggufv2:Q4_K_M | 0.1625 | 0.755556 | 0.193786 | 0.166667 | 0.368814 | 0.426667 | 0.76 | 0 | nan | 1 | nan | 0.333333 | 0 | 0.378848 | 0.351074 | 0.301567 |
llama-2-chat:7:ggufv2:Q8_0 | 0 | 0.355556 | 0.0847297 | 1 | 0.40056 | 0.266667 | 0.64 | 0 | nan | 0.5 | nan | 0.444444 | 0 | 0.335632 | 0.345594 | 0.286246 |
mixtral-instruct-v0.1:46_7:ggufv2:Q5_K_M | 0 | 0.711111 | 0.235659 | 0 | 0.352941 | 0.333333 | 0.84 | 0.25 | nan | 1 | nan | 0.422222 | 0 | 0.376842 | 0.343137 | 0.313874 |
mixtral-instruct-v0.1:46_7:ggufv2:Q6_K | 0 | 0.85 | 0.225524 | 0 | 0.34267 | 0.333333 | 0.826667 | 0.25 | nan | 0.7 | nan | 0.475 | 0 | 0.363927 | 0.338002 | 0.289705 |
mixtral-instruct-v0.1:46_7:ggufv2:Q3_K_M | 0.065625 | 0.777778 | 0.229622 | 0 | nan | 0.38 | 0.893333 | 0.25 | nan | 0.5 | nan | 0.333333 | 0 | 0.342969 | 0.333333 | 0.278279 |
code-llama-instruct:13:ggufv2:Q4_K_M | 0 | 0.775 | nan | 0.333333 | nan | 0.533333 | 0.833333 | 0 | nan | 0.5 | nan | 0 | 0 | 0.330556 | 0.331944 | 0.30939 |
llama-2-chat:7:ggufv2:Q6_K | 0 | 0.333333 | 0.0614608 | 1 | 0.406162 | 0.266667 | 0.66 | 0 | nan | 0.5 | nan | 0.375 | 0 | 0.327511 | 0.330422 | 0.288285 |
mixtral-instruct-v0.1:46_7:ggufv2:Q8_0 | 0 | 0.666667 | 0.189177 | 0.133333 | 0.358543 | 0.386667 | 0.846667 | 0.25 | nan | 0.6 | nan | 0.311111 | 0 | 0.340197 | 0.325654 | 0.248216 |
llama-2-chat:7:ggufv2:Q5_K_M | 0.0375 | 0.288889 | 0.0697591 | 1 | 0.40056 | 0.293333 | 0.633333 | 0 | nan | 0.6 | nan | 0.444444 | 0 | 0.342529 | 0.317931 | 0.289372 |
mixtral-instruct-v0.1:46_7:ggufv2:Q2_K | 0 | 0.733333 | 0.157514 | 0.333333 | 0.329599 | 0.48 | 0.726667 | 0 | nan | 0.6 | nan | 0 | 0 | 0.305495 | 0.317547 | 0.269925 |
llama-2-chat:13:ggufv2:Q2_K | 0 | 0.288889 | 0.0649389 | 1 | 0.414566 | 0.366667 | 0.433333 | 0 | nan | 0.5 | nan | 0 | 0 | 0.278945 | 0.283917 | 0.285171 |
llama-2-chat:7:ggufv2:Q2_K | 0 | 0.688889 | 0.0361865 | 0.833333 | 0.369748 | 0.1 | 0.686667 | 0 | nan | 1 | nan | 0 | 0 | 0.337711 | 0.218856 | 0.359055 |
code-llama-instruct:13:ggufv2:Q2_K | 0 | 0.875 | nan | 0.0333333 | nan | 0.566667 | 0.82 | 0 | nan | 0.4 | nan | 0 | 0 | 0.299444 | 0.166389 | 0.336056 |
code-llama-instruct:13:ggufv2:Q3_K_M | 0 | 0.85 | nan | 0 | nan | 0.533333 | 0.833333 | 0 | nan | 0 | nan | 0.45 | 0 | 0.296296 | 0.148148 | 0.336707 |