Benchmark - All Results
BioCypher query generation
In this set of tasks, we test LLM abilities to generate queries for a BioCypher Knowledge Graph using BioChatter.
The schema_config.yaml
of the BioCypher Knowledge Graph and a natural language query are passed to BioChatter.
Individual steps of the query generation process are tested separately, as well as the end-to-end performance of the process.
Full model name | Score achieved | Score possible | Score SD | Accuracy | Iterations |
---|---|---|---|---|---|
claude-3-5-sonnet-20240620 | 22 | 22 | 0 | 1 | 3 |
claude-3-opus-20240229 | 24 | 24 | 0 | 1 | 3 |
llama-3.1-instruct:8:ggufv2:Q5_K_M | 24 | 24 | 0 | 1 | 3 |
openhermes-2.5:7:ggufv2:Q3_K_M | 279 | 279 | 0 | 1 | 5 |
llama-3.1-instruct:70:ggufv2:IQ4_XS | 24 | 24 | 0 | 1 | 3 |
llama-3.1-instruct:70:ggufv2:Q3_K_S | 24 | 24 | 0 | 1 | 3 |
llama-3.1-instruct:70:ggufv2:IQ2_M | 24 | 24 | 0 | 1 | 3 |
llama-3.1-instruct:8:ggufv2:Q3_K_L | 132 | 132 | 0 | 1 | 3 |
llama-3.1-instruct:8:ggufv2:Q6_K | 24 | 24 | 0 | 1 | 3 |
gpt-4o-2024-05-13 | 70 | 70 | 0 | 1 | 5 |
gpt-4-turbo-2024-04-09 | 64 | 64 | 0 | 1 | 5 |
gpt-4o-2024-08-06 | 42 | 42 | 0 | 1 | 3 |
llama-3.1-instruct:8:ggufv2:Q8_0 | 186 | 186 | 0 | 1 | 3 |
gpt-3.5-turbo-0125 | 45 | 46 | 0 | 0.978261 | 5 |
gpt-4o-mini-2024-07-18 | 70 | 76 | 0 | 0.921053 | 5 |
gpt-4-0613 | 58 | 63 | 0 | 0.920635 | 5 |
llama-3.1-instruct:8:ggufv2:Q4_K_M | 138 | 150 | 0 | 0.92 | 3 |
llama-3.1-instruct:8:ggufv2:IQ4_XS | 59 | 66 | 0 | 0.893939 | 3 |
gpt-3.5-turbo-0613 | 40 | 45 | 0 | 0.888889 | 5 |
llama-3-instruct:8:ggufv2:Q8_0 | 35 | 40 | 0 | 0.875 | 5 |
llama-3-instruct:8:ggufv2:Q6_K | 35 | 40 | 0 | 0.875 | 5 |
llama-3-instruct:8:ggufv2:Q5_K_M | 35 | 40 | 0 | 0.875 | 5 |
llama-3-instruct:8:ggufv2:Q4_K_M | 31 | 36 | 0 | 0.861111 | 5 |
gpt-4-0125-preview | 47 | 57 | 0 | 0.824561 | 5 |
openhermes-2.5:7:ggufv2:Q6_K | 307 | 400 | 0 | 0.7675 | 5 |
openhermes-2.5:7:ggufv2:Q5_K_M | 280 | 369 | 0 | 0.758808 | 5 |
chatglm3:6:ggmlv3:q4_0 | 30 | 40 | 0 | 0.75 | 5 |
openhermes-2.5:7:ggufv2:Q4_K_M | 223 | 333 | 0 | 0.66967 | 5 |
openhermes-2.5:7:ggufv2:Q8_0 | 277 | 441 | 0 | 0.628118 | 5 |
openhermes-2.5:7:ggufv2:Q2_K | 136 | 225 | 0 | 0.604444 | 5 |
llama-2-chat:7:ggufv2:Q4_K_M | 77 | 135 | 0 | 0.57037 | 5 |
llama-2-chat:7:ggufv2:Q5_K_M | 77 | 135 | 0 | 0.57037 | 5 |
llama-2-chat:7:ggufv2:Q6_K | 72 | 130 | 0 | 0.553846 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q6_K | 20 | 40 | 0 | 0.5 | 5 |
code-llama-instruct:7:ggufv2:Q3_K_M | 20 | 40 | 0 | 0.5 | 5 |
llama-2-chat:7:ggufv2:Q8_0 | 65 | 135 | 0 | 0.481481 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q6_K | 19 | 40 | 0 | 0.475 | 5 |
code-llama-instruct:13:ggufv2:Q3_K_M | 18 | 40 | 0 | 0.45 | 5 |
llama-2-chat:70:ggufv2:Q4_K_M | 20 | 45 | 0 | 0.444444 | 5 |
llama-2-chat:70:ggufv2:Q5_K_M | 20 | 45 | 0 | 0.444444 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q5_K_M | 20 | 45 | 0 | 0.444444 | 5 |
llama-2-chat:7:ggufv2:Q3_K_M | 51 | 117 | 0 | 0.435897 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q5_K_M | 19 | 45 | 0 | 0.422222 | 5 |
llama-2-chat:7:ggufv2:Q2_K | 48 | 117 | 0 | 0.410256 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q3_K_M | 15 | 45 | 0 | 0.333333 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q3_K_M | 15 | 45 | 0 | 0.333333 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q4_K_M | 15 | 45 | 0 | 0.333333 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q8_0 | 15 | 45 | 0 | 0.333333 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q4_K_M | 15 | 45 | 0 | 0.333333 | 5 |
llama-2-chat:70:ggufv2:Q3_K_M | 15 | 45 | 0 | 0.333333 | 5 |
code-llama-instruct:7:ggufv2:Q4_K_M | 15 | 45 | 0 | 0.333333 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q8_0 | 14 | 45 | 0 | 0.311111 | 5 |
code-llama-instruct:34:ggufv2:Q8_0 | 10 | 40 | 0 | 0.25 | 5 |
code-llama-instruct:7:ggufv2:Q2_K | 10 | 40 | 0 | 0.25 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q2_K | 10 | 45 | 0 | 0.222222 | 5 |
code-llama-instruct:34:ggufv2:Q5_K_M | 5 | 40 | 0 | 0.125 | 5 |
code-llama-instruct:34:ggufv2:Q6_K | 5 | 40 | 0 | 0.125 | 5 |
code-llama-instruct:7:ggufv2:Q5_K_M | 5 | 45 | 0 | 0.111111 | 5 |
code-llama-instruct:13:ggufv2:Q4_K_M | 0 | 40 | 0 | 0 | 5 |
code-llama-instruct:13:ggufv2:Q2_K | 0 | 40 | 0 | 0 | 5 |
code-llama-instruct:34:ggufv2:Q4_K_M | 0 | 40 | 0 | 0 | 5 |
code-llama-instruct:34:ggufv2:Q3_K_M | 0 | 40 | 0 | 0 | 5 |
code-llama-instruct:34:ggufv2:Q2_K | 0 | 40 | 0 | 0 | 5 |
code-llama-instruct:13:ggufv2:Q8_0 | 0 | 40 | 0 | 0 | 5 |
code-llama-instruct:13:ggufv2:Q6_K | 0 | 40 | 0 | 0 | 5 |
code-llama-instruct:13:ggufv2:Q5_K_M | 0 | 40 | 0 | 0 | 5 |
llama-2-chat:13:ggufv2:Q5_K_M | 0 | 45 | 0 | 0 | 5 |
code-llama-instruct:7:ggufv2:Q8_0 | 0 | 45 | 0 | 0 | 5 |
code-llama-instruct:7:ggufv2:Q6_K | 0 | 40 | 0 | 0 | 5 |
llama-2-chat:13:ggufv2:Q2_K | 0 | 45 | 0 | 0 | 5 |
llama-2-chat:13:ggufv2:Q3_K_M | 0 | 45 | 0 | 0 | 5 |
llama-2-chat:13:ggufv2:Q4_K_M | 0 | 45 | 0 | 0 | 5 |
llama-2-chat:13:ggufv2:Q8_0 | 0 | 45 | 0 | 0 | 5 |
llama-2-chat:13:ggufv2:Q6_K | 0 | 40 | 0 | 0 | 5 |
llama-2-chat:70:ggufv2:Q2_K | 0 | 45 | 0 | 0 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q2_K | 0 | 45 | 0 | 0 | 5 |
Full model name | Score achieved | Score possible | Score SD | Accuracy | Iterations |
---|---|---|---|---|---|
gpt-3.5-turbo-0125 | 69 | 69 | 0 | 1 | 5 |
openhermes-2.5:7:ggufv2:Q3_K_M | 87 | 87 | 0 | 1 | 5 |
gpt-4o-2024-08-06 | 63 | 63 | 0 | 1 | 3 |
openhermes-2.5:7:ggufv2:Q5_K_M | 78 | 87 | 0 | 0.896552 | 5 |
openhermes-2.5:7:ggufv2:Q4_K_M | 78 | 87 | 0 | 0.896552 | 5 |
openhermes-2.5:7:ggufv2:Q8_0 | 78 | 87 | 0 | 0.896552 | 5 |
openhermes-2.5:7:ggufv2:Q6_K | 78 | 87 | 0 | 0.896552 | 5 |
gpt-4-0125-preview | 54 | 69 | 0 | 0.782609 | 5 |
gpt-4-0613 | 48 | 69 | 0 | 0.695652 | 5 |
openhermes-2.5:7:ggufv2:Q2_K | 57 | 87 | 0 | 0.655172 | 5 |
code-llama-instruct:34:ggufv2:Q2_K | 30 | 60 | 0 | 0.5 | 5 |
gpt-3.5-turbo-0613 | 30 | 60 | 0 | 0.5 | 5 |
chatglm3:6:ggmlv3:q4_0 | 24 | 60 | 0 | 0.4 | 5 |
llama-3.1-instruct:8:ggufv2:Q4_K_M | 18 | 63 | 0 | 0.285714 | 3 |
mixtral-instruct-v0.1:46_7:ggufv2:Q5_K_M | 15 | 60 | 0 | 0.25 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q8_0 | 15 | 60 | 0 | 0.25 | 5 |
llama-2-chat:70:ggufv2:Q4_K_M | 15 | 60 | 0 | 0.25 | 5 |
code-llama-instruct:7:ggufv2:Q2_K | 15 | 60 | 0 | 0.25 | 5 |
code-llama-instruct:7:ggufv2:Q3_K_M | 15 | 60 | 0 | 0.25 | 5 |
llama-2-chat:70:ggufv2:Q5_K_M | 15 | 60 | 0 | 0.25 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q6_K | 15 | 60 | 0 | 0.25 | 5 |
code-llama-instruct:34:ggufv2:Q3_K_M | 15 | 60 | 0 | 0.25 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q3_K_M | 15 | 60 | 0 | 0.25 | 5 |
llama-3.1-instruct:8:ggufv2:Q3_K_L | 9 | 63 | 0 | 0.142857 | 3 |
llama-3.1-instruct:8:ggufv2:Q8_0 | 9 | 63 | 0 | 0.142857 | 3 |
gpt-4-turbo-2024-04-09 | 9 | 69 | 0 | 0.130435 | 5 |
gpt-4o-2024-05-13 | 9 | 69 | 0 | 0.130435 | 5 |
gpt-4o-mini-2024-07-18 | 9 | 69 | 0 | 0.130435 | 5 |
llama-2-chat:7:ggufv2:Q8_0 | 9 | 87 | 0 | 0.103448 | 5 |
llama-2-chat:7:ggufv2:Q3_K_M | 9 | 87 | 0 | 0.103448 | 5 |
code-llama-instruct:34:ggufv2:Q4_K_M | 0 | 60 | 0 | 0 | 5 |
code-llama-instruct:13:ggufv2:Q6_K | 0 | 60 | 0 | 0 | 5 |
code-llama-instruct:13:ggufv2:Q4_K_M | 0 | 60 | 0 | 0 | 5 |
code-llama-instruct:13:ggufv2:Q3_K_M | 0 | 60 | 0 | 0 | 5 |
claude-3-opus-20240229 | 0 | 36 | 0 | 0 | 3 |
code-llama-instruct:13:ggufv2:Q2_K | 0 | 60 | 0 | 0 | 5 |
claude-3-5-sonnet-20240620 | 0 | 36 | 0 | 0 | 3 |
code-llama-instruct:13:ggufv2:Q8_0 | 0 | 60 | 0 | 0 | 5 |
code-llama-instruct:13:ggufv2:Q5_K_M | 0 | 60 | 0 | 0 | 5 |
code-llama-instruct:7:ggufv2:Q4_K_M | 0 | 60 | 0 | 0 | 5 |
code-llama-instruct:7:ggufv2:Q8_0 | 0 | 60 | 0 | 0 | 5 |
code-llama-instruct:7:ggufv2:Q5_K_M | 0 | 60 | 0 | 0 | 5 |
code-llama-instruct:7:ggufv2:Q6_K | 0 | 60 | 0 | 0 | 5 |
code-llama-instruct:34:ggufv2:Q8_0 | 0 | 60 | 0 | 0 | 5 |
llama-2-chat:7:ggufv2:Q6_K | 0 | 87 | 0 | 0 | 5 |
llama-2-chat:7:ggufv2:Q5_K_M | 0 | 87 | 0 | 0 | 5 |
llama-2-chat:7:ggufv2:Q4_K_M | 0 | 87 | 0 | 0 | 5 |
llama-2-chat:7:ggufv2:Q2_K | 0 | 87 | 0 | 0 | 5 |
llama-2-chat:70:ggufv2:Q3_K_M | 0 | 60 | 0 | 0 | 5 |
llama-2-chat:13:ggufv2:Q6_K | 0 | 60 | 0 | 0 | 5 |
llama-2-chat:13:ggufv2:Q8_0 | 0 | 60 | 0 | 0 | 5 |
llama-2-chat:70:ggufv2:Q2_K | 0 | 60 | 0 | 0 | 5 |
llama-2-chat:13:ggufv2:Q4_K_M | 0 | 60 | 0 | 0 | 5 |
llama-2-chat:13:ggufv2:Q3_K_M | 0 | 60 | 0 | 0 | 5 |
llama-2-chat:13:ggufv2:Q2_K | 0 | 60 | 0 | 0 | 5 |
llama-2-chat:13:ggufv2:Q5_K_M | 0 | 60 | 0 | 0 | 5 |
code-llama-instruct:34:ggufv2:Q5_K_M | 0 | 60 | 0 | 0 | 5 |
code-llama-instruct:34:ggufv2:Q6_K | 0 | 60 | 0 | 0 | 5 |
llama-3-instruct:8:ggufv2:Q8_0 | 0 | 60 | 0 | 0 | 5 |
llama-3-instruct:8:ggufv2:Q4_K_M | 0 | 60 | 0 | 0 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q3_K_M | 0 | 60 | 0 | 0 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q2_K | 0 | 60 | 0 | 0 | 5 |
llama-3.1-instruct:8:ggufv2:Q6_K | 0 | 36 | 0 | 0 | 3 |
llama-3.1-instruct:8:ggufv2:Q5_K_M | 0 | 36 | 0 | 0 | 3 |
llama-3.1-instruct:8:ggufv2:IQ4_XS | 0 | 45 | 0 | 0 | 3 |
llama-3.1-instruct:70:ggufv2:IQ2_M | 0 | 36 | 0 | 0 | 3 |
llama-3.1-instruct:70:ggufv2:IQ4_XS | 0 | 36 | 0 | 0 | 3 |
llama-3.1-instruct:70:ggufv2:Q3_K_S | 0 | 36 | 0 | 0 | 3 |
llama-3-instruct:8:ggufv2:Q6_K | 0 | 60 | 0 | 0 | 5 |
llama-3-instruct:8:ggufv2:Q5_K_M | 0 | 60 | 0 | 0 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q4_K_M | 0 | 60 | 0 | 0 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q2_K | 0 | 60 | 0 | 0 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q5_K_M | 0 | 60 | 0 | 0 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q4_K_M | 0 | 60 | 0 | 0 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q6_K | 0 | 60 | 0 | 0 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q8_0 | 0 | 60 | 0 | 0 | 5 |
Full model name | Score achieved | Score possible | Score SD | Accuracy | Iterations |
---|---|---|---|---|---|
llama-3.1-instruct:8:ggufv2:Q8_0 | 129 | 228 | 0 | 0.565789 | 3 |
llama-3.1-instruct:8:ggufv2:Q4_K_M | 117 | 228 | 0 | 0.513158 | 3 |
llama-3.1-instruct:8:ggufv2:Q3_K_L | 111 | 228 | 0 | 0.486842 | 3 |
llama-3.1-instruct:8:ggufv2:Q6_K | 90 | 192 | 0 | 0.46875 | 3 |
llama-3.1-instruct:8:ggufv2:Q5_K_M | 84 | 192 | 0 | 0.4375 | 3 |
gpt-4o-2024-08-06 | 97 | 228 | 1.1547 | 0.425439 | 3 |
claude-3-opus-20240229 | 81 | 192 | 0 | 0.421875 | 3 |
gpt-4o-mini-2024-07-18 | 129 | 332 | 0.547723 | 0.388554 | 5 |
llama-3.1-instruct:8:ggufv2:IQ4_XS | 79 | 204 | 0 | 0.387255 | 3 |
gpt-4-0613 | 127 | 332 | 0 | 0.38253 | 5 |
llama-3.1-instruct:70:ggufv2:IQ4_XS | 72 | 192 | 0 | 0.375 | 3 |
claude-3-5-sonnet-20240620 | 72 | 192 | 0 | 0.375 | 3 |
llama-3.1-instruct:70:ggufv2:Q3_K_S | 72 | 192 | 0 | 0.375 | 3 |
gpt-3.5-turbo-0125 | 122 | 332 | 0 | 0.36747 | 5 |
gpt-3.5-turbo-0613 | 116 | 320 | 0 | 0.3625 | 5 |
llama-3.1-instruct:70:ggufv2:IQ2_M | 63 | 192 | 0 | 0.328125 | 3 |
gpt-4-turbo-2024-04-09 | 108 | 332 | 0.894427 | 0.325301 | 5 |
chatglm3:6:ggmlv3:q4_0 | 92 | 320 | 0 | 0.2875 | 5 |
llama-3-instruct:8:ggufv2:Q8_0 | 90 | 320 | 0 | 0.28125 | 5 |
llama-3-instruct:8:ggufv2:Q6_K | 90 | 320 | 0 | 0.28125 | 5 |
openhermes-2.5:7:ggufv2:Q8_0 | 70 | 356 | 0 | 0.196629 | 5 |
openhermes-2.5:7:ggufv2:Q5_K_M | 70 | 356 | 0 | 0.196629 | 5 |
llama-3-instruct:8:ggufv2:Q5_K_M | 60 | 320 | 0 | 0.1875 | 5 |
llama-2-chat:70:ggufv2:Q3_K_M | 55 | 320 | 0 | 0.171875 | 5 |
openhermes-2.5:7:ggufv2:Q3_K_M | 61 | 356 | 0 | 0.171348 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q4_K_M | 52 | 320 | 0 | 0.1625 | 5 |
openhermes-2.5:7:ggufv2:Q6_K | 45 | 356 | 0 | 0.126404 | 5 |
openhermes-2.5:7:ggufv2:Q4_K_M | 45 | 356 | 0 | 0.126404 | 5 |
llama-3-instruct:8:ggufv2:Q4_K_M | 35 | 320 | 0 | 0.109375 | 5 |
llama-2-chat:7:ggufv2:Q3_K_M | 32 | 356 | 0 | 0.0898876 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q3_K_M | 21 | 320 | 0 | 0.065625 | 5 |
code-llama-instruct:7:ggufv2:Q2_K | 20 | 320 | 0 | 0.0625 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q6_K | 15 | 320 | 0 | 0.046875 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q3_K_M | 15 | 320 | 0 | 0.046875 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q8_0 | 12 | 320 | 0 | 0.0375 | 5 |
llama-2-chat:7:ggufv2:Q5_K_M | 12 | 356 | 0 | 0.0337079 | 5 |
gpt-4o-2024-05-13 | 10 | 332 | 0 | 0.0301205 | 5 |
gpt-4-0125-preview | 10 | 332 | 0 | 0.0301205 | 5 |
openhermes-2.5:7:ggufv2:Q2_K | 6 | 356 | 0 | 0.0168539 | 5 |
code-llama-instruct:13:ggufv2:Q2_K | 0 | 320 | 0 | 0 | 5 |
code-llama-instruct:13:ggufv2:Q6_K | 0 | 320 | 0 | 0 | 5 |
code-llama-instruct:13:ggufv2:Q5_K_M | 0 | 320 | 0 | 0 | 5 |
code-llama-instruct:13:ggufv2:Q4_K_M | 0 | 320 | 0 | 0 | 5 |
code-llama-instruct:13:ggufv2:Q3_K_M | 0 | 320 | 0 | 0 | 5 |
code-llama-instruct:34:ggufv2:Q4_K_M | 0 | 320 | 0 | 0 | 5 |
code-llama-instruct:34:ggufv2:Q3_K_M | 0 | 320 | 0 | 0 | 5 |
code-llama-instruct:34:ggufv2:Q2_K | 0 | 320 | 0 | 0 | 5 |
code-llama-instruct:13:ggufv2:Q8_0 | 0 | 320 | 0 | 0 | 5 |
llama-2-chat:7:ggufv2:Q6_K | 0 | 356 | 0 | 0 | 5 |
llama-2-chat:7:ggufv2:Q4_K_M | 0 | 356 | 0 | 0 | 5 |
llama-2-chat:70:ggufv2:Q5_K_M | 0 | 320 | 0 | 0 | 5 |
llama-2-chat:7:ggufv2:Q2_K | 0 | 356 | 0 | 0 | 5 |
llama-2-chat:13:ggufv2:Q4_K_M | 0 | 320 | 0 | 0 | 5 |
llama-2-chat:13:ggufv2:Q3_K_M | 0 | 320 | 0 | 0 | 5 |
llama-2-chat:13:ggufv2:Q2_K | 0 | 320 | 0 | 0 | 5 |
llama-2-chat:13:ggufv2:Q5_K_M | 0 | 320 | 0 | 0 | 5 |
code-llama-instruct:34:ggufv2:Q8_0 | 0 | 320 | 0 | 0 | 5 |
code-llama-instruct:34:ggufv2:Q6_K | 0 | 320 | 0 | 0 | 5 |
code-llama-instruct:34:ggufv2:Q5_K_M | 0 | 320 | 0 | 0 | 5 |
code-llama-instruct:7:ggufv2:Q8_0 | 0 | 320 | 0 | 0 | 5 |
code-llama-instruct:7:ggufv2:Q3_K_M | 0 | 320 | 0 | 0 | 5 |
code-llama-instruct:7:ggufv2:Q4_K_M | 0 | 320 | 0 | 0 | 5 |
code-llama-instruct:7:ggufv2:Q5_K_M | 0 | 320 | 0 | 0 | 5 |
code-llama-instruct:7:ggufv2:Q6_K | 0 | 320 | 0 | 0 | 5 |
llama-2-chat:70:ggufv2:Q4_K_M | 0 | 320 | 0 | 0 | 5 |
llama-2-chat:13:ggufv2:Q6_K | 0 | 320 | 0 | 0 | 5 |
llama-2-chat:13:ggufv2:Q8_0 | 0 | 320 | 0 | 0 | 5 |
llama-2-chat:70:ggufv2:Q2_K | 0 | 320 | 0 | 0 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q2_K | 0 | 320 | 0 | 0 | 5 |
llama-2-chat:7:ggufv2:Q8_0 | 0 | 356 | 0 | 0 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q5_K_M | 0 | 320 | 0 | 0 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q4_K_M | 0 | 320 | 0 | 0 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q5_K_M | 0 | 320 | 0 | 0 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q2_K | 0 | 320 | 0 | 0 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q6_K | 0 | 320 | 0 | 0 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q8_0 | 0 | 320 | 0 | 0 | 5 |
Full model name | Score achieved | Score possible | Score SD | Accuracy | Iterations |
---|---|---|---|---|---|
claude-3-opus-20240229 | 24 | 24 | 0 | 1 | 3 |
code-llama-instruct:34:ggufv2:Q4_K_M | 39 | 40 | 0 | 0.975 | 5 |
code-llama-instruct:34:ggufv2:Q5_K_M | 38 | 40 | 0 | 0.95 | 5 |
code-llama-instruct:34:ggufv2:Q8_0 | 37 | 40 | 0 | 0.925 | 5 |
llama-3.1-instruct:70:ggufv2:IQ2_M | 22 | 24 | 0.57735 | 0.916667 | 3 |
code-llama-instruct:34:ggufv2:Q6_K | 36 | 40 | 0 | 0.9 | 5 |
code-llama-instruct:34:ggufv2:Q3_K_M | 35 | 40 | 0 | 0.875 | 5 |
code-llama-instruct:13:ggufv2:Q2_K | 35 | 40 | 0 | 0.875 | 5 |
claude-3-5-sonnet-20240620 | 26 | 30 | 0.57735 | 0.866667 | 3 |
code-llama-instruct:13:ggufv2:Q3_K_M | 34 | 40 | 0 | 0.85 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q6_K | 34 | 40 | 0 | 0.85 | 5 |
llama-3.1-instruct:8:ggufv2:Q6_K | 20 | 24 | 1.1547 | 0.833333 | 3 |
llama-3.1-instruct:8:ggufv2:Q5_K_M | 20 | 24 | 1.1547 | 0.833333 | 3 |
code-llama-instruct:13:ggufv2:Q6_K | 33 | 40 | 0 | 0.825 | 5 |
code-llama-instruct:7:ggufv2:Q3_K_M | 36 | 45 | 0 | 0.8 | 5 |
code-llama-instruct:7:ggufv2:Q2_K | 32 | 40 | 0 | 0.8 | 5 |
gpt-3.5-turbo-0125 | 45 | 57 | 0 | 0.789474 | 5 |
llama-2-chat:70:ggufv2:Q5_K_M | 35 | 45 | 0 | 0.777778 | 5 |
llama-2-chat:70:ggufv2:Q3_K_M | 35 | 45 | 0 | 0.777778 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q3_K_M | 35 | 45 | 0 | 0.777778 | 5 |
llama-2-chat:13:ggufv2:Q4_K_M | 35 | 45 | 0 | 0.777778 | 5 |
llama-3-instruct:8:ggufv2:Q4_K_M | 31 | 40 | 0 | 0.775 | 5 |
code-llama-instruct:7:ggufv2:Q6_K | 31 | 40 | 0 | 0.775 | 5 |
code-llama-instruct:13:ggufv2:Q4_K_M | 31 | 40 | 0 | 0.775 | 5 |
code-llama-instruct:13:ggufv2:Q5_K_M | 31 | 40 | 0 | 0.775 | 5 |
llama-3-instruct:8:ggufv2:Q6_K | 31 | 40 | 0 | 0.775 | 5 |
llama-2-chat:13:ggufv2:Q6_K | 31 | 40 | 0 | 0.775 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q4_K_M | 34 | 45 | 0 | 0.755556 | 5 |
gpt-3.5-turbo-0613 | 34 | 45 | 0 | 0.755556 | 5 |
llama-2-chat:70:ggufv2:Q4_K_M | 34 | 45 | 0 | 0.755556 | 5 |
code-llama-instruct:34:ggufv2:Q2_K | 30 | 40 | 0 | 0.75 | 5 |
code-llama-instruct:13:ggufv2:Q8_0 | 30 | 40 | 0 | 0.75 | 5 |
llama-3.1-instruct:70:ggufv2:IQ4_XS | 18 | 24 | 0 | 0.75 | 3 |
mixtral-instruct-v0.1:46_7:ggufv2:Q2_K | 33 | 45 | 0 | 0.733333 | 5 |
llama-2-chat:13:ggufv2:Q3_K_M | 33 | 45 | 0 | 0.733333 | 5 |
llama-3-instruct:8:ggufv2:Q8_0 | 29 | 40 | 0 | 0.725 | 5 |
llama-2-chat:13:ggufv2:Q8_0 | 32 | 45 | 0 | 0.711111 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q5_K_M | 32 | 45 | 0 | 0.711111 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q4_K_M | 31 | 45 | 0 | 0.688889 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q5_K_M | 31 | 45 | 0 | 0.688889 | 5 |
code-llama-instruct:7:ggufv2:Q5_K_M | 31 | 45 | 0 | 0.688889 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q8_0 | 30 | 45 | 0 | 0.666667 | 5 |
code-llama-instruct:7:ggufv2:Q8_0 | 30 | 45 | 0 | 0.666667 | 5 |
gpt-4-0613 | 46 | 69 | 0 | 0.666667 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q3_K_M | 30 | 45 | 0 | 0.666667 | 5 |
llama-2-chat:70:ggufv2:Q2_K | 30 | 45 | 0 | 0.666667 | 5 |
gpt-4-turbo-2024-04-09 | 46 | 70 | 0 | 0.657143 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q6_K | 26 | 40 | 0 | 0.65 | 5 |
llama-3-instruct:8:ggufv2:Q5_K_M | 26 | 40 | 0 | 0.65 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q8_0 | 29 | 45 | 0 | 0.644444 | 5 |
llama-2-chat:13:ggufv2:Q5_K_M | 29 | 45 | 0 | 0.644444 | 5 |
llama-3.1-instruct:70:ggufv2:Q3_K_S | 15 | 24 | 0 | 0.625 | 3 |
gpt-4-0125-preview | 39 | 63 | 0 | 0.619048 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q2_K | 27 | 45 | 0 | 0.6 | 5 |
code-llama-instruct:7:ggufv2:Q4_K_M | 27 | 45 | 0 | 0.6 | 5 |
gpt-4o-2024-05-13 | 40 | 76 | 0 | 0.526316 | 5 |
gpt-4o-mini-2024-07-18 | 43 | 82 | 0.547723 | 0.52439 | 5 |
llama-3.1-instruct:8:ggufv2:IQ4_XS | 29 | 66 | 0 | 0.439394 | 3 |
llama-2-chat:7:ggufv2:Q2_K | 38 | 117 | 0.57735 | 0.324786 | 5 |
llama-2-chat:13:ggufv2:Q2_K | 13 | 45 | 0 | 0.288889 | 5 |
chatglm3:6:ggmlv3:q4_0 | 11 | 40 | 0 | 0.275 | 5 |
llama-2-chat:7:ggufv2:Q4_K_M | 34 | 135 | 1.1547 | 0.251852 | 5 |
llama-3.1-instruct:8:ggufv2:Q3_K_L | 36 | 150 | 2.88675 | 0.24 | 3 |
llama-2-chat:7:ggufv2:Q3_K_M | 28 | 135 | 1.1547 | 0.207407 | 5 |
openhermes-2.5:7:ggufv2:Q2_K | 52 | 279 | 0.57735 | 0.18638 | 5 |
gpt-4o-2024-08-06 | 34 | 189 | 0.57735 | 0.179894 | 3 |
llama-2-chat:7:ggufv2:Q6_K | 24 | 135 | 0 | 0.177778 | 5 |
llama-2-chat:7:ggufv2:Q8_0 | 25 | 153 | 1 | 0.163399 | 5 |
llama-3.1-instruct:8:ggufv2:Q8_0 | 33 | 204 | 1.73205 | 0.161765 | 3 |
openhermes-2.5:7:ggufv2:Q3_K_M | 53 | 338 | 0.57735 | 0.156805 | 5 |
llama-2-chat:7:ggufv2:Q5_K_M | 21 | 135 | 0.57735 | 0.155556 | 5 |
llama-3.1-instruct:8:ggufv2:Q4_K_M | 28 | 186 | 0.57735 | 0.150538 | 3 |
openhermes-2.5:7:ggufv2:Q4_K_M | 52 | 369 | 0 | 0.140921 | 5 |
openhermes-2.5:7:ggufv2:Q5_K_M | 49 | 405 | 0.57735 | 0.120988 | 5 |
openhermes-2.5:7:ggufv2:Q6_K | 50 | 441 | 0.57735 | 0.113379 | 5 |
openhermes-2.5:7:ggufv2:Q8_0 | 48 | 477 | 1.1547 | 0.100629 | 5 |
Full model name | Score achieved | Score possible | Score SD | Accuracy | Iterations |
---|---|---|---|---|---|
claude-3-opus-20240229 | 66 | 90 | 0 | 0.733333 | 3 |
llama-3.1-instruct:8:ggufv2:Q5_K_M | 63 | 90 | 0 | 0.7 | 3 |
gpt-4-0613 | 118 | 173 | 0 | 0.682081 | 5 |
llama-3-instruct:8:ggufv2:Q4_K_M | 100 | 150 | 0 | 0.666667 | 5 |
llama-3-instruct:8:ggufv2:Q8_0 | 100 | 150 | 0 | 0.666667 | 5 |
llama-3-instruct:8:ggufv2:Q6_K | 100 | 150 | 0 | 0.666667 | 5 |
llama-3.1-instruct:8:ggufv2:Q4_K_M | 105 | 159 | 0 | 0.660377 | 3 |
llama-3.1-instruct:8:ggufv2:Q8_0 | 105 | 159 | 0 | 0.660377 | 3 |
code-llama-instruct:7:ggufv2:Q4_K_M | 98 | 150 | 0 | 0.653333 | 5 |
llama-3.1-instruct:8:ggufv2:IQ4_XS | 73 | 113 | 0 | 0.646018 | 3 |
llama-3.1-instruct:70:ggufv2:IQ2_M | 57 | 90 | 0 | 0.633333 | 3 |
claude-3-5-sonnet-20240620 | 57 | 90 | 0 | 0.633333 | 3 |
llama-3.1-instruct:70:ggufv2:Q3_K_S | 57 | 90 | 0 | 0.633333 | 3 |
llama-3.1-instruct:8:ggufv2:Q6_K | 57 | 90 | 0 | 0.633333 | 3 |
llama-3.1-instruct:8:ggufv2:Q3_K_L | 99 | 159 | 0 | 0.622642 | 3 |
code-llama-instruct:34:ggufv2:Q3_K_M | 90 | 150 | 0 | 0.6 | 5 |
llama-3-instruct:8:ggufv2:Q5_K_M | 90 | 150 | 0 | 0.6 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q2_K | 86 | 150 | 0 | 0.573333 | 5 |
code-llama-instruct:13:ggufv2:Q2_K | 85 | 150 | 0 | 0.566667 | 5 |
code-llama-instruct:13:ggufv2:Q8_0 | 85 | 150 | 0 | 0.566667 | 5 |
code-llama-instruct:13:ggufv2:Q5_K_M | 85 | 150 | 0 | 0.566667 | 5 |
code-llama-instruct:34:ggufv2:Q2_K | 85 | 150 | 0 | 0.566667 | 5 |
llama-3.1-instruct:70:ggufv2:IQ4_XS | 51 | 90 | 0 | 0.566667 | 3 |
openhermes-2.5:7:ggufv2:Q5_K_M | 124 | 219 | 0 | 0.56621 | 5 |
openhermes-2.5:7:ggufv2:Q6_K | 122 | 219 | 0 | 0.557078 | 5 |
code-llama-instruct:13:ggufv2:Q6_K | 81 | 150 | 0 | 0.54 | 5 |
gpt-4o-2024-05-13 | 93 | 173 | 0 | 0.537572 | 5 |
gpt-4o-mini-2024-07-18 | 93 | 173 | 0 | 0.537572 | 5 |
code-llama-instruct:7:ggufv2:Q2_K | 80 | 150 | 0 | 0.533333 | 5 |
code-llama-instruct:13:ggufv2:Q3_K_M | 80 | 150 | 0 | 0.533333 | 5 |
code-llama-instruct:13:ggufv2:Q4_K_M | 80 | 150 | 0 | 0.533333 | 5 |
gpt-4o-2024-08-06 | 84 | 159 | 0 | 0.528302 | 3 |
gpt-3.5-turbo-0125 | 89 | 173 | 0 | 0.514451 | 5 |
gpt-4-turbo-2024-04-09 | 88 | 173 | 0 | 0.508671 | 5 |
gpt-3.5-turbo-0613 | 75 | 150 | 0 | 0.5 | 5 |
openhermes-2.5:7:ggufv2:Q8_0 | 109 | 219 | 0 | 0.497717 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q2_K | 72 | 150 | 0 | 0.48 | 5 |
chatglm3:6:ggmlv3:q4_0 | 72 | 150 | 0 | 0.48 | 5 |
llama-2-chat:13:ggufv2:Q8_0 | 72 | 150 | 0 | 0.48 | 5 |
llama-2-chat:13:ggufv2:Q3_K_M | 72 | 150 | 0 | 0.48 | 5 |
openhermes-2.5:7:ggufv2:Q4_K_M | 105 | 219 | 0.57735 | 0.479452 | 5 |
llama-2-chat:70:ggufv2:Q2_K | 71 | 150 | 0 | 0.473333 | 5 |
code-llama-instruct:34:ggufv2:Q6_K | 71 | 150 | 0 | 0.473333 | 5 |
openhermes-2.5:7:ggufv2:Q3_K_M | 103 | 219 | 0 | 0.47032 | 5 |
code-llama-instruct:34:ggufv2:Q4_K_M | 70 | 150 | 0 | 0.466667 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q5_K_M | 70 | 150 | 0 | 0.466667 | 5 |
code-llama-instruct:34:ggufv2:Q5_K_M | 70 | 150 | 0 | 0.466667 | 5 |
code-llama-instruct:34:ggufv2:Q8_0 | 70 | 150 | 0 | 0.466667 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q3_K_M | 70 | 150 | 0 | 0.466667 | 5 |
gpt-4-0125-preview | 79 | 173 | 0 | 0.456647 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q6_K | 65 | 150 | 0 | 0.433333 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q8_0 | 65 | 150 | 0 | 0.433333 | 5 |
llama-2-chat:13:ggufv2:Q5_K_M | 65 | 150 | 0 | 0.433333 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q4_K_M | 64 | 150 | 0 | 0.426667 | 5 |
code-llama-instruct:7:ggufv2:Q3_K_M | 64 | 150 | 0 | 0.426667 | 5 |
openhermes-2.5:7:ggufv2:Q2_K | 92 | 219 | 0 | 0.420091 | 5 |
llama-2-chat:70:ggufv2:Q4_K_M | 63 | 150 | 0 | 0.42 | 5 |
llama-2-chat:70:ggufv2:Q3_K_M | 62 | 150 | 0 | 0.413333 | 5 |
code-llama-instruct:7:ggufv2:Q8_0 | 60 | 150 | 0 | 0.4 | 5 |
code-llama-instruct:7:ggufv2:Q5_K_M | 60 | 150 | 0 | 0.4 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q8_0 | 58 | 150 | 0 | 0.386667 | 5 |
llama-2-chat:13:ggufv2:Q6_K | 58 | 150 | 0 | 0.386667 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q3_K_M | 57 | 150 | 0 | 0.38 | 5 |
llama-2-chat:13:ggufv2:Q2_K | 55 | 150 | 0 | 0.366667 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q4_K_M | 55 | 150 | 0 | 0.366667 | 5 |
llama-2-chat:13:ggufv2:Q4_K_M | 55 | 150 | 0 | 0.366667 | 5 |
llama-2-chat:70:ggufv2:Q5_K_M | 54 | 150 | 0 | 0.36 | 5 |
llama-2-chat:7:ggufv2:Q5_K_M | 74 | 219 | 0 | 0.3379 | 5 |
code-llama-instruct:7:ggufv2:Q6_K | 50 | 150 | 0 | 0.333333 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q5_K_M | 50 | 150 | 0 | 0.333333 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q6_K | 50 | 150 | 0 | 0.333333 | 5 |
llama-2-chat:7:ggufv2:Q6_K | 64 | 219 | 0 | 0.292237 | 5 |
llama-2-chat:7:ggufv2:Q8_0 | 64 | 219 | 0 | 0.292237 | 5 |
llama-2-chat:7:ggufv2:Q4_K_M | 60 | 219 | 0 | 0.273973 | 5 |
llama-2-chat:7:ggufv2:Q3_K_M | 50 | 219 | 0 | 0.228311 | 5 |
llama-2-chat:7:ggufv2:Q2_K | 36 | 219 | 0 | 0.164384 | 5 |
Full model name | Score achieved | Score possible | Score SD | Accuracy | Iterations |
---|---|---|---|---|---|
claude-3-5-sonnet-20240620 | 87 | 90 | 0 | 0.966667 | 3 |
code-llama-instruct:7:ggufv2:Q4_K_M | 145 | 150 | 0 | 0.966667 | 5 |
llama-3.1-instruct:70:ggufv2:Q3_K_S | 87 | 90 | 0 | 0.966667 | 3 |
code-llama-instruct:7:ggufv2:Q6_K | 144 | 150 | 0 | 0.96 | 5 |
code-llama-instruct:7:ggufv2:Q5_K_M | 144 | 150 | 0 | 0.96 | 5 |
code-llama-instruct:7:ggufv2:Q8_0 | 144 | 150 | 0 | 0.96 | 5 |
gpt-4-0613 | 166 | 173 | 0 | 0.959538 | 5 |
llama-3.1-instruct:70:ggufv2:IQ2_M | 86 | 90 | 1.1547 | 0.955556 | 3 |
llama-3.1-instruct:70:ggufv2:IQ4_XS | 86 | 90 | 1.1547 | 0.955556 | 3 |
llama-3.1-instruct:8:ggufv2:Q6_K | 86 | 90 | 0.57735 | 0.955556 | 3 |
gpt-3.5-turbo-0125 | 165 | 173 | 0 | 0.953757 | 5 |
gpt-4o-mini-2024-07-18 | 165 | 173 | 0 | 0.953757 | 5 |
llama-3.1-instruct:8:ggufv2:IQ4_XS | 107 | 113 | 0.57735 | 0.946903 | 3 |
gpt-3.5-turbo-0613 | 142 | 150 | 0 | 0.946667 | 5 |
claude-3-opus-20240229 | 85 | 90 | 0.57735 | 0.944444 | 3 |
llama-3.1-instruct:8:ggufv2:Q3_K_L | 150 | 159 | 0 | 0.943396 | 3 |
llama-3.1-instruct:8:ggufv2:Q8_0 | 149 | 159 | 0.57735 | 0.937107 | 3 |
llama-3.1-instruct:8:ggufv2:Q5_K_M | 84 | 90 | 0 | 0.933333 | 3 |
llama-3-instruct:8:ggufv2:Q5_K_M | 139 | 150 | 0 | 0.926667 | 5 |
llama-3-instruct:8:ggufv2:Q6_K | 139 | 150 | 0 | 0.926667 | 5 |
llama-3.1-instruct:8:ggufv2:Q4_K_M | 147 | 159 | 0 | 0.924528 | 3 |
code-llama-instruct:7:ggufv2:Q2_K | 138 | 150 | 0 | 0.92 | 5 |
llama-3-instruct:8:ggufv2:Q4_K_M | 138 | 150 | 0 | 0.92 | 5 |
llama-2-chat:70:ggufv2:Q4_K_M | 138 | 150 | 0 | 0.92 | 5 |
llama-3-instruct:8:ggufv2:Q8_0 | 138 | 150 | 0 | 0.92 | 5 |
openhermes-2.5:7:ggufv2:Q5_K_M | 201 | 219 | 0.57735 | 0.917808 | 5 |
openhermes-2.5:7:ggufv2:Q2_K | 201 | 219 | 0 | 0.917808 | 5 |
openhermes-2.5:7:ggufv2:Q3_K_M | 201 | 219 | 3 | 0.917808 | 5 |
llama-2-chat:70:ggufv2:Q5_K_M | 136 | 150 | 0 | 0.906667 | 5 |
code-llama-instruct:34:ggufv2:Q4_K_M | 136 | 150 | 0 | 0.906667 | 5 |
llama-2-chat:70:ggufv2:Q3_K_M | 136 | 150 | 0 | 0.906667 | 5 |
openhermes-2.5:7:ggufv2:Q8_0 | 198 | 219 | 0 | 0.90411 | 5 |
llama-2-chat:70:ggufv2:Q2_K | 135 | 150 | 0 | 0.9 | 5 |
code-llama-instruct:34:ggufv2:Q5_K_M | 135 | 150 | 0 | 0.9 | 5 |
openhermes-2.5:7:ggufv2:Q4_K_M | 196 | 219 | 0.57735 | 0.894977 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q3_K_M | 134 | 150 | 0 | 0.893333 | 5 |
openhermes-2.5:7:ggufv2:Q6_K | 195 | 219 | 0 | 0.890411 | 5 |
gpt-4o-2024-08-06 | 139 | 159 | 2.3094 | 0.874214 | 3 |
code-llama-instruct:7:ggufv2:Q3_K_M | 131 | 150 | 0 | 0.873333 | 5 |
code-llama-instruct:34:ggufv2:Q8_0 | 129 | 150 | 0 | 0.86 | 5 |
code-llama-instruct:34:ggufv2:Q6_K | 128 | 150 | 0 | 0.853333 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q8_0 | 127 | 150 | 0 | 0.846667 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q8_0 | 127 | 150 | 0 | 0.846667 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q5_K_M | 126 | 150 | 0 | 0.84 | 5 |
gpt-4-0125-preview | 145 | 173 | 0 | 0.83815 | 5 |
code-llama-instruct:13:ggufv2:Q3_K_M | 125 | 150 | 0 | 0.833333 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q6_K | 125 | 150 | 0 | 0.833333 | 5 |
code-llama-instruct:13:ggufv2:Q4_K_M | 125 | 150 | 0 | 0.833333 | 5 |
gpt-4-turbo-2024-04-09 | 144 | 173 | 0.447214 | 0.83237 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q5_K_M | 124 | 150 | 0 | 0.826667 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q4_K_M | 124 | 150 | 0 | 0.826667 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q6_K | 124 | 150 | 0 | 0.826667 | 5 |
code-llama-instruct:13:ggufv2:Q2_K | 123 | 150 | 0 | 0.82 | 5 |
llama-2-chat:13:ggufv2:Q6_K | 122 | 150 | 0 | 0.813333 | 5 |
gpt-4o-2024-05-13 | 140 | 173 | 0 | 0.809249 | 5 |
code-llama-instruct:13:ggufv2:Q6_K | 119 | 150 | 0 | 0.793333 | 5 |
code-llama-instruct:34:ggufv2:Q3_K_M | 118 | 150 | 0 | 0.786667 | 5 |
llama-2-chat:13:ggufv2:Q8_0 | 118 | 150 | 0 | 0.786667 | 5 |
code-llama-instruct:13:ggufv2:Q5_K_M | 117 | 150 | 0 | 0.78 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q3_K_M | 116 | 150 | 0 | 0.773333 | 5 |
code-llama-instruct:13:ggufv2:Q8_0 | 115 | 150 | 0 | 0.766667 | 5 |
llama-2-chat:13:ggufv2:Q4_K_M | 114 | 150 | 0 | 0.76 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q4_K_M | 114 | 150 | 0 | 0.76 | 5 |
llama-2-chat:13:ggufv2:Q5_K_M | 112 | 150 | 0 | 0.746667 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q2_K | 109 | 150 | 0 | 0.726667 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q2_K | 104 | 150 | 0 | 0.693333 | 5 |
code-llama-instruct:34:ggufv2:Q2_K | 103 | 150 | 0 | 0.686667 | 5 |
llama-2-chat:13:ggufv2:Q3_K_M | 102 | 150 | 0 | 0.68 | 5 |
llama-2-chat:7:ggufv2:Q2_K | 134 | 219 | 1.1547 | 0.611872 | 5 |
llama-2-chat:7:ggufv2:Q4_K_M | 134 | 219 | 2.68223 | 0.611872 | 5 |
llama-2-chat:7:ggufv2:Q8_0 | 129 | 219 | 0 | 0.589041 | 5 |
llama-2-chat:7:ggufv2:Q3_K_M | 129 | 219 | 1.1547 | 0.589041 | 5 |
llama-2-chat:7:ggufv2:Q6_K | 123 | 219 | 0 | 0.561644 | 5 |
chatglm3:6:ggmlv3:q4_0 | 83 | 150 | 0 | 0.553333 | 5 |
llama-2-chat:7:ggufv2:Q5_K_M | 120 | 219 | 0.57735 | 0.547945 | 5 |
llama-2-chat:13:ggufv2:Q2_K | 65 | 150 | 0 | 0.433333 | 5 |
Full model name | Score achieved | Score possible | Score SD | Accuracy | Iterations |
---|---|---|---|---|---|
gpt-3.5-turbo-0125 | 159 | 173 | 0 | 0.919075 | 5 |
gpt-4-0613 | 152 | 173 | 0 | 0.878613 | 5 |
gpt-3.5-turbo-0613 | 125 | 150 | 0 | 0.833333 | 5 |
gpt-4o-2024-08-06 | 132 | 159 | 1.1547 | 0.830189 | 3 |
llama-3.1-instruct:8:ggufv2:Q3_K_L | 129 | 159 | 0 | 0.811321 | 3 |
llama-3.1-instruct:8:ggufv2:Q8_0 | 123 | 159 | 0 | 0.773585 | 3 |
llama-3.1-instruct:8:ggufv2:IQ4_XS | 84 | 113 | 0 | 0.743363 | 3 |
llama-3.1-instruct:8:ggufv2:Q4_K_M | 117 | 159 | 0 | 0.735849 | 3 |
claude-3-5-sonnet-20240620 | 66 | 90 | 0 | 0.733333 | 3 |
llama-3.1-instruct:8:ggufv2:Q5_K_M | 66 | 90 | 0 | 0.733333 | 3 |
llama-3.1-instruct:8:ggufv2:Q6_K | 66 | 90 | 0 | 0.733333 | 3 |
gpt-4o-mini-2024-07-18 | 119 | 173 | 1.54266 | 0.687861 | 5 |
claude-3-opus-20240229 | 59 | 90 | 3.21455 | 0.655556 | 3 |
gpt-4-turbo-2024-04-09 | 110 | 173 | 0 | 0.635838 | 5 |
llama-3.1-instruct:70:ggufv2:Q3_K_S | 54 | 90 | 3.4641 | 0.6 | 3 |
llama-3.1-instruct:70:ggufv2:IQ4_XS | 54 | 90 | 0 | 0.6 | 3 |
llama-3.1-instruct:70:ggufv2:IQ2_M | 54 | 90 | 0 | 0.6 | 3 |
openhermes-2.5:7:ggufv2:Q3_K_M | 63 | 219 | 1 | 0.287671 | 5 |
openhermes-2.5:7:ggufv2:Q6_K | 60 | 219 | 1.73205 | 0.273973 | 5 |
openhermes-2.5:7:ggufv2:Q5_K_M | 58 | 219 | 0.57735 | 0.26484 | 5 |
openhermes-2.5:7:ggufv2:Q4_K_M | 54 | 219 | 1.1547 | 0.246575 | 5 |
openhermes-2.5:7:ggufv2:Q8_0 | 52 | 219 | 1.1547 | 0.237443 | 5 |
openhermes-2.5:7:ggufv2:Q2_K | 35 | 219 | 2.3094 | 0.159817 | 5 |
gpt-4o-2024-05-13 | 20 | 173 | 0 | 0.115607 | 5 |
gpt-4-0125-preview | 19 | 173 | 0 | 0.109827 | 5 |
chatglm3:6:ggmlv3:q4_0 | 0 | 150 | 0 | 0 | 5 |
code-llama-instruct:13:ggufv2:Q2_K | 0 | 150 | 0 | 0 | 5 |
code-llama-instruct:13:ggufv2:Q3_K_M | 0 | 150 | 0 | 0 | 5 |
code-llama-instruct:34:ggufv2:Q4_K_M | 0 | 150 | 0 | 0 | 5 |
code-llama-instruct:34:ggufv2:Q3_K_M | 0 | 150 | 0 | 0 | 5 |
code-llama-instruct:34:ggufv2:Q2_K | 0 | 150 | 0 | 0 | 5 |
code-llama-instruct:13:ggufv2:Q8_0 | 0 | 150 | 0 | 0 | 5 |
code-llama-instruct:13:ggufv2:Q6_K | 0 | 150 | 0 | 0 | 5 |
code-llama-instruct:13:ggufv2:Q5_K_M | 0 | 150 | 0 | 0 | 5 |
code-llama-instruct:13:ggufv2:Q4_K_M | 0 | 150 | 0 | 0 | 5 |
code-llama-instruct:7:ggufv2:Q8_0 | 0 | 150 | 0 | 0 | 5 |
code-llama-instruct:7:ggufv2:Q6_K | 0 | 150 | 0 | 0 | 5 |
code-llama-instruct:7:ggufv2:Q5_K_M | 0 | 150 | 0 | 0 | 5 |
code-llama-instruct:7:ggufv2:Q4_K_M | 0 | 150 | 0 | 0 | 5 |
code-llama-instruct:7:ggufv2:Q3_K_M | 0 | 150 | 0 | 0 | 5 |
code-llama-instruct:7:ggufv2:Q2_K | 0 | 150 | 0 | 0 | 5 |
code-llama-instruct:34:ggufv2:Q8_0 | 0 | 150 | 0 | 0 | 5 |
code-llama-instruct:34:ggufv2:Q6_K | 0 | 150 | 0 | 0 | 5 |
code-llama-instruct:34:ggufv2:Q5_K_M | 0 | 150 | 0 | 0 | 5 |
llama-2-chat:7:ggufv2:Q6_K | 0 | 219 | 0 | 0 | 5 |
llama-2-chat:7:ggufv2:Q5_K_M | 0 | 219 | 0 | 0 | 5 |
llama-2-chat:7:ggufv2:Q4_K_M | 0 | 219 | 0 | 0 | 5 |
llama-2-chat:7:ggufv2:Q3_K_M | 0 | 219 | 0 | 0 | 5 |
llama-2-chat:7:ggufv2:Q2_K | 0 | 219 | 0 | 0 | 5 |
llama-2-chat:70:ggufv2:Q5_K_M | 0 | 150 | 0 | 0 | 5 |
llama-2-chat:70:ggufv2:Q4_K_M | 0 | 150 | 0 | 0 | 5 |
llama-2-chat:70:ggufv2:Q3_K_M | 0 | 150 | 0 | 0 | 5 |
llama-2-chat:70:ggufv2:Q2_K | 0 | 150 | 0 | 0 | 5 |
llama-2-chat:13:ggufv2:Q8_0 | 0 | 150 | 0 | 0 | 5 |
llama-2-chat:13:ggufv2:Q6_K | 0 | 150 | 0 | 0 | 5 |
llama-2-chat:13:ggufv2:Q5_K_M | 0 | 150 | 0 | 0 | 5 |
llama-2-chat:13:ggufv2:Q4_K_M | 0 | 150 | 0 | 0 | 5 |
llama-2-chat:13:ggufv2:Q3_K_M | 0 | 150 | 0 | 0 | 5 |
llama-2-chat:13:ggufv2:Q2_K | 0 | 150 | 0 | 0 | 5 |
llama-3-instruct:8:ggufv2:Q8_0 | 0 | 150 | 0 | 0 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q3_K_M | 0 | 150 | 0 | 0 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q2_K | 0 | 150 | 0 | 0 | 5 |
llama-3-instruct:8:ggufv2:Q4_K_M | 0 | 150 | 0 | 0 | 5 |
llama-2-chat:7:ggufv2:Q8_0 | 0 | 219 | 0 | 0 | 5 |
llama-3-instruct:8:ggufv2:Q6_K | 0 | 150 | 0 | 0 | 5 |
llama-3-instruct:8:ggufv2:Q5_K_M | 0 | 150 | 0 | 0 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q4_K_M | 0 | 150 | 0 | 0 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q5_K_M | 0 | 150 | 0 | 0 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q5_K_M | 0 | 150 | 0 | 0 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q4_K_M | 0 | 150 | 0 | 0 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q3_K_M | 0 | 150 | 0 | 0 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q2_K | 0 | 150 | 0 | 0 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q8_0 | 0 | 150 | 0 | 0 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q6_K | 0 | 150 | 0 | 0 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q6_K | 0 | 150 | 0 | 0 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q8_0 | 0 | 150 | 0 | 0 | 5 |
Retrieval-Augmented Generation (RAG)
In this set of tasks, we test LLM abilities to generate answers to a given question using a RAG agent, or to judge the relevance of a RAG fragment to a given question. Instructions can be explicit ("is this fragment relevant to the question?") or implicit (just asking the question without instructions and evaluating whether the model responds with 'not enough information given').
Full model name | Score achieved | Score possible | Score SD | Accuracy | Iterations |
---|---|---|---|---|---|
claude-3-5-sonnet-20240620 | 18 | 18 | 0 | 1 | 3 |
llama-2-chat:13:ggufv2:Q6_K | 30 | 30 | 0 | 1 | 5 |
llama-2-chat:70:ggufv2:Q2_K | 30 | 30 | 0 | 1 | 5 |
gpt-4o-2024-08-06 | 18 | 18 | 0 | 1 | 3 |
code-llama-instruct:7:ggufv2:Q8_0 | 30 | 30 | 0 | 1 | 5 |
gpt-3.5-turbo-0125 | 30 | 30 | 0 | 1 | 5 |
gpt-3.5-turbo-0613 | 30 | 30 | 0 | 1 | 5 |
gpt-4-0125-preview | 30 | 30 | 0 | 1 | 5 |
gpt-4-0613 | 30 | 30 | 0 | 1 | 5 |
gpt-4-turbo-2024-04-09 | 30 | 30 | 0 | 1 | 5 |
gpt-4o-2024-05-13 | 30 | 30 | 0 | 1 | 5 |
code-llama-instruct:7:ggufv2:Q4_K_M | 30 | 30 | 0 | 1 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q4_K_M | 30 | 30 | 0 | 1 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q5_K_M | 30 | 30 | 0 | 1 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q6_K | 30 | 30 | 0 | 1 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q8_0 | 30 | 30 | 0 | 1 | 5 |
openhermes-2.5:7:ggufv2:Q4_K_M | 30 | 30 | 0 | 1 | 5 |
openhermes-2.5:7:ggufv2:Q5_K_M | 30 | 30 | 0 | 1 | 5 |
openhermes-2.5:7:ggufv2:Q2_K | 30 | 30 | 0 | 1 | 5 |
openhermes-2.5:7:ggufv2:Q3_K_M | 30 | 30 | 0 | 1 | 5 |
openhermes-2.5:7:ggufv2:Q6_K | 30 | 30 | 0 | 1 | 5 |
llama-3.1-instruct:8:ggufv2:Q3_K_L | 18 | 18 | 0 | 1 | 3 |
llama-3.1-instruct:8:ggufv2:Q4_K_M | 18 | 18 | 0 | 1 | 3 |
llama-2-chat:13:ggufv2:Q8_0 | 30 | 30 | 0 | 1 | 5 |
llama-2-chat:13:ggufv2:Q5_K_M | 30 | 30 | 0 | 1 | 5 |
llama-2-chat:13:ggufv2:Q4_K_M | 30 | 30 | 0 | 1 | 5 |
llama-2-chat:13:ggufv2:Q3_K_M | 30 | 30 | 0 | 1 | 5 |
llama-2-chat:13:ggufv2:Q2_K | 30 | 30 | 0 | 1 | 5 |
llama-2-chat:7:ggufv2:Q5_K_M | 30 | 30 | 0 | 1 | 5 |
llama-2-chat:70:ggufv2:Q4_K_M | 30 | 30 | 0 | 1 | 5 |
llama-2-chat:70:ggufv2:Q5_K_M | 30 | 30 | 0 | 1 | 5 |
llama-2-chat:7:ggufv2:Q3_K_M | 30 | 30 | 0 | 1 | 5 |
llama-2-chat:7:ggufv2:Q6_K | 30 | 30 | 0 | 1 | 5 |
llama-2-chat:7:ggufv2:Q4_K_M | 30 | 30 | 0 | 1 | 5 |
llama-2-chat:70:ggufv2:Q3_K_M | 30 | 30 | 0 | 1 | 5 |
llama-3.1-instruct:8:ggufv2:IQ4_XS | 18 | 18 | 0 | 1 | 3 |
llama-3.1-instruct:70:ggufv2:Q3_K_S | 18 | 18 | 0 | 1 | 3 |
llama-3.1-instruct:70:ggufv2:IQ4_XS | 18 | 18 | 0 | 1 | 3 |
llama-3.1-instruct:70:ggufv2:IQ2_M | 18 | 18 | 0 | 1 | 3 |
llama-3-instruct:8:ggufv2:Q8_0 | 30 | 30 | 0 | 1 | 5 |
llama-3-instruct:8:ggufv2:Q6_K | 30 | 30 | 0 | 1 | 5 |
llama-3-instruct:8:ggufv2:Q5_K_M | 30 | 30 | 0 | 1 | 5 |
llama-3-instruct:8:ggufv2:Q4_K_M | 30 | 30 | 0 | 1 | 5 |
llama-2-chat:7:ggufv2:Q8_0 | 30 | 30 | 0 | 1 | 5 |
openhermes-2.5:7:ggufv2:Q8_0 | 30 | 30 | 0 | 1 | 5 |
llama-3.1-instruct:8:ggufv2:Q8_0 | 18 | 18 | 0 | 1 | 3 |
mistral-instruct-v0.2:7:ggufv2:Q2_K | 30 | 30 | 0 | 1 | 5 |
llama-3.1-instruct:8:ggufv2:Q5_K_M | 18 | 18 | 0 | 1 | 3 |
llama-3.1-instruct:8:ggufv2:Q6_K | 18 | 18 | 0 | 1 | 3 |
mistral-instruct-v0.2:7:ggufv2:Q3_K_M | 30 | 30 | 0 | 1 | 5 |
code-llama-instruct:7:ggufv2:Q6_K | 25 | 30 | 0 | 0.833333 | 5 |
code-llama-instruct:7:ggufv2:Q5_K_M | 25 | 30 | 0 | 0.833333 | 5 |
code-llama-instruct:13:ggufv2:Q6_K | 25 | 30 | 0 | 0.833333 | 5 |
code-llama-instruct:13:ggufv2:Q8_0 | 25 | 30 | 0 | 0.833333 | 5 |
claude-3-opus-20240229 | 15 | 18 | 0 | 0.833333 | 3 |
code-llama-instruct:7:ggufv2:Q3_K_M | 25 | 30 | 0 | 0.833333 | 5 |
llama-2-chat:7:ggufv2:Q2_K | 25 | 30 | 0 | 0.833333 | 5 |
gpt-4o-mini-2024-07-18 | 25 | 30 | 0 | 0.833333 | 5 |
chatglm3:6:ggmlv3:q4_0 | 22 | 30 | 0 | 0.733333 | 5 |
code-llama-instruct:13:ggufv2:Q5_K_M | 20 | 30 | 0 | 0.666667 | 5 |
code-llama-instruct:34:ggufv2:Q2_K | 15 | 30 | 0 | 0.5 | 5 |
code-llama-instruct:34:ggufv2:Q3_K_M | 15 | 30 | 0 | 0.5 | 5 |
code-llama-instruct:34:ggufv2:Q4_K_M | 15 | 30 | 0 | 0.5 | 5 |
code-llama-instruct:13:ggufv2:Q4_K_M | 10 | 30 | 0 | 0.333333 | 5 |
code-llama-instruct:34:ggufv2:Q5_K_M | 10 | 30 | 0 | 0.333333 | 5 |
code-llama-instruct:34:ggufv2:Q6_K | 10 | 30 | 0 | 0.333333 | 5 |
code-llama-instruct:34:ggufv2:Q8_0 | 10 | 30 | 0 | 0.333333 | 5 |
code-llama-instruct:7:ggufv2:Q2_K | 10 | 30 | 0 | 0.333333 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q2_K | 10 | 30 | 0 | 0.333333 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q4_K_M | 5 | 30 | 0 | 0.166667 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q8_0 | 4 | 30 | 0 | 0.133333 | 5 |
code-llama-instruct:13:ggufv2:Q2_K | 1 | 30 | 0 | 0.0333333 | 5 |
code-llama-instruct:13:ggufv2:Q3_K_M | 0 | 30 | 0 | 0 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q3_K_M | 0 | 30 | 0 | 0 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q5_K_M | 0 | 30 | 0 | 0 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q6_K | 0 | 30 | 0 | 0 | 5 |
Full model name | Score achieved | Score possible | Score SD | Accuracy | Iterations |
---|---|---|---|---|---|
chatglm3:6:ggmlv3:q4_0 | 10 | 10 | 0 | 1 | 5 |
claude-3-5-sonnet-20240620 | 6 | 6 | 0 | 1 | 3 |
claude-3-opus-20240229 | 6 | 6 | 0 | 1 | 3 |
code-llama-instruct:34:ggufv2:Q2_K | 10 | 10 | 0 | 1 | 5 |
llama-2-chat:7:ggufv2:Q2_K | 10 | 10 | 0 | 1 | 5 |
llama-2-chat:7:ggufv2:Q3_K_M | 10 | 10 | 0 | 1 | 5 |
llama-2-chat:70:ggufv2:Q4_K_M | 10 | 10 | 0 | 1 | 5 |
gpt-4-turbo-2024-04-09 | 10 | 10 | 0 | 1 | 5 |
gpt-3.5-turbo-0613 | 10 | 10 | 0 | 1 | 5 |
gpt-4-0613 | 10 | 10 | 0 | 1 | 5 |
code-llama-instruct:7:ggufv2:Q4_K_M | 10 | 10 | 0 | 1 | 5 |
code-llama-instruct:34:ggufv2:Q5_K_M | 10 | 10 | 0 | 1 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q4_K_M | 10 | 10 | 0 | 1 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q5_K_M | 10 | 10 | 0 | 1 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q6_K | 10 | 10 | 0 | 1 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q5_K_M | 10 | 10 | 0 | 1 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q4_K_M | 10 | 10 | 0 | 1 | 5 |
openhermes-2.5:7:ggufv2:Q5_K_M | 10 | 10 | 0 | 1 | 5 |
openhermes-2.5:7:ggufv2:Q6_K | 10 | 10 | 0 | 1 | 5 |
llama-3.1-instruct:8:ggufv2:Q8_0 | 6 | 6 | 0 | 1 | 3 |
mistral-instruct-v0.2:7:ggufv2:Q3_K_M | 10 | 10 | 0 | 1 | 5 |
llama-3.1-instruct:70:ggufv2:Q3_K_S | 6 | 6 | 0 | 1 | 3 |
llama-3.1-instruct:70:ggufv2:IQ2_M | 6 | 6 | 0 | 1 | 3 |
llama-3.1-instruct:70:ggufv2:IQ4_XS | 6 | 6 | 0 | 1 | 3 |
llama-3-instruct:8:ggufv2:Q8_0 | 10 | 10 | 0 | 1 | 5 |
llama-3-instruct:8:ggufv2:Q6_K | 10 | 10 | 0 | 1 | 5 |
openhermes-2.5:7:ggufv2:Q8_0 | 10 | 10 | 0 | 1 | 5 |
openhermes-2.5:7:ggufv2:Q4_K_M | 10 | 10 | 0 | 1 | 5 |
llama-3-instruct:8:ggufv2:Q5_K_M | 10 | 10 | 0 | 1 | 5 |
llama-3-instruct:8:ggufv2:Q4_K_M | 10 | 10 | 0 | 1 | 5 |
gpt-3.5-turbo-0125 | 9 | 10 | 0 | 0.9 | 5 |
code-llama-instruct:7:ggufv2:Q6_K | 9 | 10 | 0 | 0.9 | 5 |
llama-2-chat:70:ggufv2:Q5_K_M | 9 | 10 | 0 | 0.9 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q8_0 | 9 | 10 | 0 | 0.9 | 5 |
code-llama-instruct:34:ggufv2:Q8_0 | 9 | 10 | 0 | 0.9 | 5 |
code-llama-instruct:34:ggufv2:Q6_K | 9 | 10 | 0 | 0.9 | 5 |
llama-3.1-instruct:8:ggufv2:Q3_K_L | 5 | 6 | 0.57735 | 0.833333 | 3 |
llama-3.1-instruct:8:ggufv2:IQ4_XS | 5 | 6 | 0.57735 | 0.833333 | 3 |
llama-3.1-instruct:8:ggufv2:Q5_K_M | 5 | 6 | 0.57735 | 0.833333 | 3 |
llama-3.1-instruct:8:ggufv2:Q4_K_M | 5 | 6 | 0.57735 | 0.833333 | 3 |
llama-3.1-instruct:8:ggufv2:Q6_K | 5 | 6 | 0.57735 | 0.833333 | 3 |
code-llama-instruct:7:ggufv2:Q3_K_M | 7 | 10 | 0 | 0.7 | 5 |
gpt-4o-2024-05-13 | 7 | 10 | 0 | 0.7 | 5 |
code-llama-instruct:7:ggufv2:Q2_K | 7 | 10 | 0 | 0.7 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q6_K | 7 | 10 | 0 | 0.7 | 5 |
gpt-4o-2024-08-06 | 4 | 6 | 0.57735 | 0.666667 | 3 |
mixtral-instruct-v0.1:46_7:ggufv2:Q8_0 | 6 | 10 | 0 | 0.6 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q2_K | 6 | 10 | 0 | 0.6 | 5 |
llama-2-chat:7:ggufv2:Q5_K_M | 6 | 10 | 0 | 0.6 | 5 |
code-llama-instruct:13:ggufv2:Q6_K | 5 | 10 | 0 | 0.5 | 5 |
code-llama-instruct:13:ggufv2:Q4_K_M | 5 | 10 | 0 | 0.5 | 5 |
code-llama-instruct:34:ggufv2:Q3_K_M | 5 | 10 | 0 | 0.5 | 5 |
llama-2-chat:7:ggufv2:Q6_K | 5 | 10 | 0 | 0.5 | 5 |
llama-2-chat:7:ggufv2:Q4_K_M | 5 | 10 | 0 | 0.5 | 5 |
llama-2-chat:70:ggufv2:Q3_K_M | 5 | 10 | 0 | 0.5 | 5 |
code-llama-instruct:13:ggufv2:Q8_0 | 5 | 10 | 0 | 0.5 | 5 |
code-llama-instruct:13:ggufv2:Q5_K_M | 5 | 10 | 0 | 0.5 | 5 |
code-llama-instruct:7:ggufv2:Q5_K_M | 5 | 10 | 0 | 0.5 | 5 |
gpt-4-0125-preview | 5 | 10 | 0 | 0.5 | 5 |
code-llama-instruct:7:ggufv2:Q8_0 | 5 | 10 | 0 | 0.5 | 5 |
llama-2-chat:13:ggufv2:Q2_K | 5 | 10 | 0 | 0.5 | 5 |
llama-2-chat:13:ggufv2:Q3_K_M | 5 | 10 | 0 | 0.5 | 5 |
gpt-4o-mini-2024-07-18 | 5 | 10 | 0 | 0.5 | 5 |
llama-2-chat:13:ggufv2:Q4_K_M | 5 | 10 | 0 | 0.5 | 5 |
llama-2-chat:13:ggufv2:Q5_K_M | 5 | 10 | 0 | 0.5 | 5 |
llama-2-chat:13:ggufv2:Q6_K | 5 | 10 | 0 | 0.5 | 5 |
llama-2-chat:13:ggufv2:Q8_0 | 5 | 10 | 0 | 0.5 | 5 |
llama-2-chat:70:ggufv2:Q2_K | 5 | 10 | 0 | 0.5 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q2_K | 5 | 10 | 0 | 0.5 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q3_K_M | 5 | 10 | 0 | 0.5 | 5 |
openhermes-2.5:7:ggufv2:Q3_K_M | 5 | 10 | 0 | 0.5 | 5 |
llama-2-chat:7:ggufv2:Q8_0 | 5 | 10 | 0 | 0.5 | 5 |
openhermes-2.5:7:ggufv2:Q2_K | 5 | 10 | 0 | 0.5 | 5 |
code-llama-instruct:13:ggufv2:Q2_K | 4 | 10 | 0 | 0.4 | 5 |
code-llama-instruct:34:ggufv2:Q4_K_M | 4 | 10 | 0 | 0.4 | 5 |
code-llama-instruct:13:ggufv2:Q3_K_M | 0 | 10 | 0 | 0 | 5 |
Text Extraction
In this set of tasks, we test LLM abilities to extract text from a given document.
Full model name | Score achieved | Score possible | Score SD | Accuracy | Iterations |
---|---|---|---|---|---|
claude-3-5-sonnet-20240620 | 224.558 | 297 | 0.0766981 | 0.756088 | 3 |
gpt-4o-2024-08-06 | 211.222 | 297 | 1.31903 | 0.711185 | 3 |
llama-3.1-instruct:70:ggufv2:IQ4_XS | 207.674 | 297 | 1.29175e-15 | 0.699238 | 3 |
claude-3-opus-20240229 | 205.297 | 297 | 0.441227 | 0.691235 | 3 |
gpt-4-0125-preview | 341.404 | 495 | 0 | 0.689705 | 5 |
gpt-4o-mini-2024-07-18 | 338.854 | 495 | 1.74807 | 0.684553 | 5 |
gpt-4-0613 | 331.107 | 495 | 0 | 0.668903 | 5 |
gpt-4o-2024-05-13 | 323.703 | 495 | 0 | 0.653946 | 5 |
gpt-4-turbo-2024-04-09 | 321.933 | 495 | 4.39682 | 0.650369 | 5 |
llama-3.1-instruct:70:ggufv2:Q3_K_S | 190.774 | 297 | 9.26323e-16 | 0.642336 | 3 |
llama-3.1-instruct:70:ggufv2:IQ2_M | 186.07 | 297 | 8.83831e-16 | 0.626498 | 3 |
openhermes-2.5:7:ggufv2:Q6_K | 306.488 | 495 | 0 | 0.619167 | 5 |
openhermes-2.5:7:ggufv2:Q8_0 | 297.41 | 495 | 0 | 0.600829 | 5 |
openhermes-2.5:7:ggufv2:Q4_K_M | 295.654 | 495 | 0 | 0.597281 | 5 |
openhermes-2.5:7:ggufv2:Q5_K_M | 287.059 | 495 | 0 | 0.579916 | 5 |
gpt-3.5-turbo-0613 | 284.814 | 495 | 0 | 0.575381 | 5 |
openhermes-2.5:7:ggufv2:Q3_K_M | 274.471 | 495 | 0 | 0.554488 | 5 |
gpt-3.5-turbo-0125 | 252.466 | 495 | 0 | 0.510032 | 5 |
openhermes-2.5:7:ggufv2:Q2_K | 219.807 | 495 | 0 | 0.444054 | 5 |
llama-3.1-instruct:8:ggufv2:IQ4_XS | 123.142 | 297 | 5.69391e-16 | 0.414621 | 3 |
llama-3.1-instruct:8:ggufv2:Q6_K | 117.157 | 297 | 3.73928e-16 | 0.394469 | 3 |
llama-3.1-instruct:8:ggufv2:Q8_0 | 115.554 | 297 | 4.18545e-16 | 0.38907 | 3 |
mistral-instruct-v0.2:7:ggufv2:Q5_K_M | 190.948 | 495 | 0 | 0.385754 | 5 |
llama-3.1-instruct:8:ggufv2:Q4_K_M | 113.462 | 297 | 0.300229 | 0.382027 | 3 |
llama-3.1-instruct:8:ggufv2:Q5_K_M | 113.002 | 297 | 0.00031952 | 0.380477 | 3 |
mistral-instruct-v0.2:7:ggufv2:Q3_K_M | 182.642 | 495 | 0 | 0.368974 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q6_K | 181.869 | 495 | 0 | 0.367412 | 5 |
llama-3.1-instruct:8:ggufv2:Q3_K_L | 107.033 | 297 | 5.50801e-16 | 0.360379 | 3 |
mistral-instruct-v0.2:7:ggufv2:Q8_0 | 174.084 | 495 | 0 | 0.351684 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q4_K_M | 171.777 | 495 | 0 | 0.347025 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q2_K | 163.974 | 495 | 0 | 0.331261 | 5 |
llama-2-chat:70:ggufv2:Q4_K_M | 119.263 | 495 | 0 | 0.240936 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q5_K_M | 116.651 | 495 | 0 | 0.235659 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q3_K_M | 113.663 | 495 | 0 | 0.229622 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q6_K | 111.634 | 495 | 0 | 0.225524 | 5 |
llama-2-chat:70:ggufv2:Q2_K | 106.448 | 495 | 0 | 0.215047 | 5 |
llama-2-chat:70:ggufv2:Q5_K_M | 104.032 | 495 | 0 | 0.210166 | 5 |
llama-2-chat:70:ggufv2:Q3_K_M | 97.9593 | 495 | 0 | 0.197898 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q4_K_M | 95.9243 | 495 | 0 | 0.193786 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q8_0 | 93.6428 | 495 | 0 | 0.189177 | 5 |
llama-3-instruct:8:ggufv2:Q8_0 | 93.3345 | 495 | 0 | 0.188555 | 5 |
chatglm3:6:ggmlv3:q4_0 | 93.2008 | 495 | 0 | 0.188284 | 5 |
llama-3-instruct:8:ggufv2:Q5_K_M | 82.3847 | 495 | 0 | 0.166434 | 5 |
llama-3-instruct:8:ggufv2:Q6_K | 80.5152 | 495 | 0 | 0.162657 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q2_K | 77.9693 | 495 | 0 | 0.157514 | 5 |
code-llama-instruct:7:ggufv2:Q4_K_M | 68.6724 | 495 | 0 | 0.138732 | 5 |
llama-3-instruct:8:ggufv2:Q4_K_M | 57.8514 | 495 | 0 | 0.116871 | 5 |
llama-2-chat:13:ggufv2:Q3_K_M | 55.7521 | 495 | 0 | 0.112631 | 5 |
llama-2-chat:13:ggufv2:Q4_K_M | 43.9894 | 495 | 0 | 0.0888675 | 5 |
llama-2-chat:7:ggufv2:Q4_K_M | 42.1985 | 495 | 0 | 0.0852494 | 5 |
llama-2-chat:7:ggufv2:Q8_0 | 25.1647 | 297 | 1.46597e-16 | 0.0847297 | 3 |
llama-2-chat:13:ggufv2:Q6_K | 23.2057 | 297 | 0.00246731 | 0.0781337 | 3 |
llama-2-chat:13:ggufv2:Q5_K_M | 37.9252 | 495 | 0 | 0.0766167 | 5 |
llama-2-chat:13:ggufv2:Q8_0 | 37.7416 | 495 | 0 | 0.0762457 | 5 |
llama-2-chat:7:ggufv2:Q5_K_M | 34.5308 | 495 | 0 | 0.0697591 | 5 |
llama-2-chat:7:ggufv2:Q3_K_M | 32.2105 | 495 | 0 | 0.0650717 | 5 |
llama-2-chat:13:ggufv2:Q2_K | 32.1447 | 495 | 0 | 0.0649389 | 5 |
llama-2-chat:7:ggufv2:Q6_K | 18.2539 | 297 | 2.57076e-16 | 0.0614608 | 3 |
llama-2-chat:7:ggufv2:Q2_K | 17.9123 | 495 | 0 | 0.0361865 | 5 |
Full model name | Subtask | Score achieved | Score possible | Score SD | Accuracy | Iterations |
---|---|---|---|---|---|---|
claude-3-5-sonnet-20240620 | assay | 23.4242 | 27 | 0 | 0.867565 | 3 |
llama-3.1-instruct:70:ggufv2:IQ2_M | assay | 22.1667 | 27 | 0 | 0.820988 | 3 |
llama-3.1-instruct:70:ggufv2:Q3_K_S | assay | 22.1667 | 27 | 0 | 0.820988 | 3 |
claude-3-opus-20240229 | assay | 21.4909 | 27 | 1.51082e-17 | 0.79596 | 3 |
gpt-4o-2024-08-06 | assay | 20.3897 | 27 | 3.02164e-17 | 0.755176 | 3 |
llama-3.1-instruct:70:ggufv2:IQ4_XS | assay | 20.3238 | 27 | 3.02164e-17 | 0.752734 | 3 |
gpt-4o-mini-2024-07-18 | assay | 33.0217 | 45 | 0.00559644 | 0.733816 | 5 |
gpt-4-turbo-2024-04-09 | assay | 26.4233 | 45 | 0.0114637 | 0.587184 | 5 |
llama-3.1-instruct:8:ggufv2:Q3_K_L | assay | 12.9241 | 27 | 1.60525e-17 | 0.478671 | 3 |
llama-3.1-instruct:8:ggufv2:IQ4_XS | assay | 12.4373 | 27 | 8.49837e-18 | 0.460639 | 3 |
llama-3.1-instruct:8:ggufv2:Q4_K_M | assay | 12.4264 | 27 | 7.55411e-18 | 0.460235 | 3 |
llama-3.1-instruct:8:ggufv2:Q6_K | assay | 12.1618 | 27 | 8.49837e-18 | 0.450438 | 3 |
llama-3.1-instruct:8:ggufv2:Q5_K_M | assay | 11.2618 | 27 | 0 | 0.417103 | 3 |
llama-3.1-instruct:8:ggufv2:Q8_0 | assay | 10.4603 | 27 | 1.51082e-17 | 0.387417 | 3 |
gpt-4o-2024-05-13 | assay | 6.67307 | 45 | 0 | 0.148291 | 5 |
gpt-4-0125-preview | assay | 6.60264 | 45 | 0 | 0.146725 | 5 |
openhermes-2.5:7:ggufv2:Q6_K | assay | 6.45354 | 45 | 0 | 0.143412 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q3_K_M | assay | 6.42156 | 45 | 0 | 0.142701 | 5 |
openhermes-2.5:7:ggufv2:Q8_0 | assay | 6.24141 | 45 | 0 | 0.138698 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q8_0 | assay | 5.8662 | 45 | 0 | 0.13036 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q2_K | assay | 5.84165 | 45 | 0 | 0.129814 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q6_K | assay | 5.83272 | 45 | 0 | 0.129616 | 5 |
openhermes-2.5:7:ggufv2:Q5_K_M | assay | 5.77475 | 45 | 0 | 0.128328 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q4_K_M | assay | 5.72421 | 45 | 0 | 0.127205 | 5 |
gpt-3.5-turbo-0613 | assay | 5.71717 | 45 | 0 | 0.127048 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q5_K_M | assay | 5.66084 | 45 | 0 | 0.125797 | 5 |
gpt-3.5-turbo-0125 | assay | 5.48324 | 45 | 0 | 0.12185 | 5 |
gpt-4-0613 | assay | 5.47238 | 45 | 0 | 0.121608 | 5 |
openhermes-2.5:7:ggufv2:Q4_K_M | assay | 5.40473 | 45 | 0 | 0.120105 | 5 |
openhermes-2.5:7:ggufv2:Q3_K_M | assay | 4.99329 | 45 | 0 | 0.110962 | 5 |
openhermes-2.5:7:ggufv2:Q2_K | assay | 4.35689 | 45 | 0 | 0.0968198 | 5 |
llama-2-chat:7:ggufv2:Q6_K | assay | 2.34166 | 27 | 7.55411e-18 | 0.0867281 | 3 |
llama-2-chat:13:ggufv2:Q6_K | assay | 2.19772 | 27 | 3.77706e-18 | 0.081397 | 3 |
mixtral-instruct-v0.1:46_7:ggufv2:Q3_K_M | assay | 3.17543 | 45 | 0 | 0.070565 | 5 |
llama-2-chat:7:ggufv2:Q8_0 | assay | 1.62311 | 27 | 0 | 0.0601152 | 3 |
llama-2-chat:70:ggufv2:Q4_K_M | assay | 1.8509 | 45 | 0 | 0.041131 | 5 |
llama-2-chat:70:ggufv2:Q5_K_M | assay | 1.81844 | 45 | 0 | 0.0404097 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q2_K | assay | 1.68419 | 45 | 0 | 0.0374265 | 5 |
chatglm3:6:ggmlv3:q4_0 | assay | 1.61672 | 45 | 0 | 0.0359271 | 5 |
code-llama-instruct:7:ggufv2:Q4_K_M | assay | 1.53778 | 45 | 0 | 0.0341728 | 5 |
llama-3-instruct:8:ggufv2:Q6_K | assay | 1.48103 | 45 | 0 | 0.0329118 | 5 |
llama-3-instruct:8:ggufv2:Q8_0 | assay | 1.37088 | 45 | 0 | 0.0304641 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q8_0 | assay | 1.16327 | 45 | 0 | 0.0258505 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q4_K_M | assay | 1.15926 | 45 | 0 | 0.0257612 | 5 |
llama-2-chat:70:ggufv2:Q2_K | assay | 1.15095 | 45 | 0 | 0.0255768 | 5 |
llama-2-chat:70:ggufv2:Q3_K_M | assay | 1.07788 | 45 | 0 | 0.023953 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q5_K_M | assay | 1.05347 | 45 | 0 | 0.0234104 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q6_K | assay | 1.02909 | 45 | 0 | 0.0228686 | 5 |
llama-2-chat:13:ggufv2:Q2_K | assay | 0.974441 | 45 | 0 | 0.0216542 | 5 |
llama-3-instruct:8:ggufv2:Q5_K_M | assay | 0.922706 | 45 | 0 | 0.0205046 | 5 |
llama-2-chat:7:ggufv2:Q5_K_M | assay | 0.919259 | 45 | 0 | 0.020428 | 5 |
llama-2-chat:13:ggufv2:Q5_K_M | assay | 0.836349 | 45 | 0 | 0.0185855 | 5 |
llama-2-chat:13:ggufv2:Q8_0 | assay | 0.756302 | 45 | 0 | 0.0168067 | 5 |
llama-2-chat:13:ggufv2:Q3_K_M | assay | 0.750557 | 45 | 0 | 0.016679 | 5 |
llama-2-chat:13:ggufv2:Q4_K_M | assay | 0.647223 | 45 | 0 | 0.0143827 | 5 |
llama-2-chat:7:ggufv2:Q4_K_M | assay | 0.604799 | 45 | 0 | 0.01344 | 5 |
llama-3-instruct:8:ggufv2:Q4_K_M | assay | 0.522273 | 45 | 0 | 0.0116061 | 5 |
llama-2-chat:7:ggufv2:Q3_K_M | assay | 0.455699 | 45 | 0 | 0.0101266 | 5 |
llama-2-chat:7:ggufv2:Q2_K | assay | 0.233824 | 45 | 0 | 0.00519608 | 5 |
Full model name | Subtask | Score achieved | Score possible | Score SD | Accuracy | Iterations |
---|---|---|---|---|---|---|
claude-3-opus-20240229 | chemical | 24 | 27 | 0 | 0.888889 | 3 |
claude-3-5-sonnet-20240620 | chemical | 21.6667 | 27 | 0 | 0.802469 | 3 |
llama-3.1-instruct:70:ggufv2:IQ4_XS | chemical | 20 | 27 | 0 | 0.740741 | 3 |
gpt-4o-2024-08-06 | chemical | 18.6667 | 27 | 0 | 0.691358 | 3 |
llama-3.1-instruct:70:ggufv2:Q3_K_S | chemical | 18 | 27 | 0 | 0.666667 | 3 |
llama-3.1-instruct:70:ggufv2:IQ2_M | chemical | 18 | 27 | 0 | 0.666667 | 3 |
gpt-4-turbo-2024-04-09 | chemical | 29.188 | 45 | 0 | 0.648623 | 5 |
gpt-4o-mini-2024-07-18 | chemical | 27.7778 | 45 | 0 | 0.617284 | 5 |
llama-3.1-instruct:8:ggufv2:IQ4_XS | chemical | 12.3451 | 27 | 0 | 0.457227 | 3 |
llama-3.1-instruct:8:ggufv2:Q4_K_M | chemical | 12.0531 | 27 | 0 | 0.446411 | 3 |
llama-3.1-instruct:8:ggufv2:Q6_K | chemical | 11.0168 | 27 | 0 | 0.408029 | 3 |
llama-3.1-instruct:8:ggufv2:Q3_K_L | chemical | 10.8547 | 27 | 2.36066e-19 | 0.402026 | 3 |
llama-3.1-instruct:8:ggufv2:Q8_0 | chemical | 9.13698 | 27 | 0 | 0.338407 | 3 |
llama-3.1-instruct:8:ggufv2:Q5_K_M | chemical | 8.35802 | 27 | 0 | 0.309556 | 3 |
gpt-4-0613 | chemical | 6.38889 | 45 | 0 | 0.141975 | 5 |
gpt-4-0125-preview | chemical | 6.22222 | 45 | 0 | 0.138272 | 5 |
openhermes-2.5:7:ggufv2:Q6_K | chemical | 6.16667 | 45 | 0 | 0.137037 | 5 |
gpt-4o-2024-05-13 | chemical | 5.55556 | 45 | 0 | 0.123457 | 5 |
gpt-3.5-turbo-0613 | chemical | 5.44444 | 45 | 0 | 0.120988 | 5 |
openhermes-2.5:7:ggufv2:Q3_K_M | chemical | 5.23309 | 45 | 0 | 0.116291 | 5 |
openhermes-2.5:7:ggufv2:Q8_0 | chemical | 5.16667 | 45 | 0 | 0.114815 | 5 |
openhermes-2.5:7:ggufv2:Q5_K_M | chemical | 5.06667 | 45 | 0 | 0.112593 | 5 |
gpt-3.5-turbo-0125 | chemical | 5.06444 | 45 | 0 | 0.112543 | 5 |
openhermes-2.5:7:ggufv2:Q4_K_M | chemical | 4.95556 | 45 | 0 | 0.110123 | 5 |
openhermes-2.5:7:ggufv2:Q2_K | chemical | 4.66667 | 45 | 0 | 0.103704 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q5_K_M | chemical | 4.02332 | 45 | 0 | 0.0894072 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q3_K_M | chemical | 3.69824 | 45 | 0 | 0.0821832 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q6_K | chemical | 3.5588 | 45 | 0 | 0.0790845 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q3_K_M | chemical | 3.23175 | 45 | 0 | 0.0718166 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q2_K | chemical | 2.9648 | 45 | 0 | 0.0658845 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q4_K_M | chemical | 2.85926 | 45 | 0 | 0.0635392 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q8_0 | chemical | 2.80214 | 45 | 0 | 0.0622698 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q6_K | chemical | 2.28839 | 45 | 0 | 0.050853 | 5 |
llama-2-chat:13:ggufv2:Q6_K | chemical | 1.33748 | 27 | 0 | 0.0495362 | 3 |
llama-3-instruct:8:ggufv2:Q6_K | chemical | 1.99259 | 45 | 0 | 0.0442798 | 5 |
llama-3-instruct:8:ggufv2:Q5_K_M | chemical | 1.98451 | 45 | 0 | 0.0441003 | 5 |
llama-3-instruct:8:ggufv2:Q8_0 | chemical | 1.98451 | 45 | 0 | 0.0441003 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q4_K_M | chemical | 1.92687 | 45 | 0 | 0.0428194 | 5 |
llama-2-chat:70:ggufv2:Q2_K | chemical | 1.92403 | 45 | 0 | 0.0427562 | 5 |
llama-2-chat:70:ggufv2:Q4_K_M | chemical | 1.86594 | 45 | 0 | 0.0414653 | 5 |
llama-2-chat:7:ggufv2:Q8_0 | chemical | 1.11429 | 27 | 3.77706e-18 | 0.0412698 | 3 |
llama-2-chat:70:ggufv2:Q5_K_M | chemical | 1.7972 | 45 | 0 | 0.0399378 | 5 |
llama-2-chat:70:ggufv2:Q3_K_M | chemical | 1.65417 | 45 | 0 | 0.0367593 | 5 |
llama-2-chat:13:ggufv2:Q4_K_M | chemical | 1.60885 | 45 | 0 | 0.0357522 | 5 |
llama-2-chat:7:ggufv2:Q6_K | chemical | 0.85 | 27 | 1.88853e-18 | 0.0314815 | 3 |
mixtral-instruct-v0.1:46_7:ggufv2:Q2_K | chemical | 1.37178 | 45 | 0 | 0.030484 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q8_0 | chemical | 1.02473 | 45 | 0 | 0.0227718 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q5_K_M | chemical | 0.993896 | 45 | 0 | 0.0220866 | 5 |
llama-3-instruct:8:ggufv2:Q4_K_M | chemical | 0.920791 | 45 | 0 | 0.020462 | 5 |
chatglm3:6:ggmlv3:q4_0 | chemical | 0.839293 | 45 | 0 | 0.018651 | 5 |
llama-2-chat:7:ggufv2:Q5_K_M | chemical | 0.580952 | 45 | 0 | 0.0129101 | 5 |
llama-2-chat:13:ggufv2:Q5_K_M | chemical | 0.473978 | 45 | 0 | 0.0105328 | 5 |
llama-2-chat:13:ggufv2:Q8_0 | chemical | 0.473978 | 45 | 0 | 0.0105328 | 5 |
llama-2-chat:13:ggufv2:Q3_K_M | chemical | 0.447004 | 45 | 0 | 0.00993343 | 5 |
code-llama-instruct:7:ggufv2:Q4_K_M | chemical | 0.44189 | 45 | 0 | 0.00981978 | 5 |
llama-2-chat:13:ggufv2:Q2_K | chemical | 0.429118 | 45 | 0 | 0.00953595 | 5 |
llama-2-chat:7:ggufv2:Q4_K_M | chemical | 0.416702 | 45 | 0 | 0.00926004 | 5 |
llama-2-chat:7:ggufv2:Q3_K_M | chemical | 0.270151 | 45 | 0 | 0.00600336 | 5 |
llama-2-chat:7:ggufv2:Q2_K | chemical | 0.264943 | 45 | 0 | 0.00588762 | 5 |
Full model name | Subtask | Score achieved | Score possible | Score SD | Accuracy | Iterations |
---|---|---|---|---|---|---|
llama-3.1-instruct:70:ggufv2:IQ2_M | context | 25.2426 | 27 | 1.51082e-17 | 0.93491 | 3 |
llama-3.1-instruct:70:ggufv2:IQ4_XS | context | 25.2195 | 27 | 3.02164e-17 | 0.934057 | 3 |
claude-3-5-sonnet-20240620 | context | 24.5401 | 27 | 6.04329e-17 | 0.908892 | 3 |
gpt-4o-2024-08-06 | context | 23.6991 | 27 | 0.00396564 | 0.877746 | 3 |
gpt-4-turbo-2024-04-09 | context | 38.9656 | 45 | 0.0255706 | 0.865903 | 5 |
claude-3-opus-20240229 | context | 23.2287 | 27 | 3.02164e-17 | 0.860323 | 3 |
gpt-4o-mini-2024-07-18 | context | 38.476 | 45 | 0.00134967 | 0.855023 | 5 |
llama-3.1-instruct:70:ggufv2:Q3_K_S | context | 22.2637 | 27 | 1.51082e-17 | 0.82458 | 3 |
llama-3.1-instruct:8:ggufv2:Q8_0 | context | 16.8902 | 27 | 9.44264e-19 | 0.625563 | 3 |
llama-3.1-instruct:8:ggufv2:IQ4_XS | context | 16.8016 | 27 | 1.51082e-17 | 0.622281 | 3 |
llama-3.1-instruct:8:ggufv2:Q4_K_M | context | 16.4994 | 27 | 2.83279e-18 | 0.611089 | 3 |
llama-3.1-instruct:8:ggufv2:Q6_K | context | 16.4093 | 27 | 1.88853e-18 | 0.607753 | 3 |
llama-3.1-instruct:8:ggufv2:Q5_K_M | context | 16.1817 | 27 | 1.88853e-18 | 0.599324 | 3 |
llama-3.1-instruct:8:ggufv2:Q3_K_L | context | 15.6028 | 27 | 1.88853e-18 | 0.57788 | 3 |
llama-2-chat:7:ggufv2:Q8_0 | context | 5.70797 | 27 | 0 | 0.211406 | 3 |
llama-2-chat:13:ggufv2:Q6_K | context | 4.88293 | 27 | 1.69967e-17 | 0.180849 | 3 |
gpt-4-0613 | context | 7.90663 | 45 | 0 | 0.175703 | 5 |
gpt-4-0125-preview | context | 7.85253 | 45 | 0 | 0.174501 | 5 |
gpt-4o-2024-05-13 | context | 7.82965 | 45 | 0 | 0.173992 | 5 |
gpt-3.5-turbo-0125 | context | 6.89247 | 45 | 0 | 0.153166 | 5 |
openhermes-2.5:7:ggufv2:Q4_K_M | context | 6.89055 | 45 | 0 | 0.153123 | 5 |
openhermes-2.5:7:ggufv2:Q6_K | context | 6.79989 | 45 | 0 | 0.151109 | 5 |
openhermes-2.5:7:ggufv2:Q3_K_M | context | 6.77271 | 45 | 0 | 0.150505 | 5 |
openhermes-2.5:7:ggufv2:Q8_0 | context | 6.67749 | 45 | 0 | 0.148389 | 5 |
gpt-3.5-turbo-0613 | context | 6.50472 | 45 | 0 | 0.144549 | 5 |
openhermes-2.5:7:ggufv2:Q5_K_M | context | 6.44769 | 45 | 0 | 0.143282 | 5 |
llama-2-chat:7:ggufv2:Q6_K | context | 3.73057 | 27 | 0 | 0.138169 | 3 |
mistral-instruct-v0.2:7:ggufv2:Q8_0 | context | 5.16754 | 45 | 0 | 0.114834 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q5_K_M | context | 5.12599 | 45 | 0 | 0.113911 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q5_K_M | context | 5.02844 | 45 | 0 | 0.111743 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q6_K | context | 5.0158 | 45 | 0 | 0.111462 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q2_K | context | 4.99362 | 45 | 0 | 0.110969 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q2_K | context | 4.51314 | 45 | 0 | 0.100292 | 5 |
llama-2-chat:70:ggufv2:Q3_K_M | context | 4.22332 | 45 | 0 | 0.0938516 | 5 |
llama-2-chat:70:ggufv2:Q4_K_M | context | 4.10284 | 45 | 0 | 0.0911743 | 5 |
llama-2-chat:70:ggufv2:Q2_K | context | 4.08979 | 45 | 0 | 0.0908843 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q3_K_M | context | 4.06318 | 45 | 0 | 0.090293 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q4_K_M | context | 4.01117 | 45 | 0 | 0.0891372 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q3_K_M | context | 3.90982 | 45 | 0 | 0.0868849 | 5 |
openhermes-2.5:7:ggufv2:Q2_K | context | 3.86897 | 45 | 0 | 0.0859772 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q4_K_M | context | 3.79416 | 45 | 0 | 0.0843146 | 5 |
llama-2-chat:70:ggufv2:Q5_K_M | context | 3.74591 | 45 | 0 | 0.0832424 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q8_0 | context | 3.70126 | 45 | 0 | 0.0822502 | 5 |
code-llama-instruct:7:ggufv2:Q4_K_M | context | 3.32657 | 45 | 0 | 0.0739237 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q6_K | context | 3.1452 | 45 | 0 | 0.0698933 | 5 |
chatglm3:6:ggmlv3:q4_0 | context | 2.85636 | 45 | 0 | 0.0634747 | 5 |
llama-2-chat:7:ggufv2:Q3_K_M | context | 2.10857 | 45 | 0 | 0.046857 | 5 |
llama-2-chat:7:ggufv2:Q4_K_M | context | 1.89605 | 45 | 0 | 0.0421345 | 5 |
llama-2-chat:13:ggufv2:Q3_K_M | context | 1.78868 | 45 | 0 | 0.0397484 | 5 |
llama-2-chat:13:ggufv2:Q5_K_M | context | 1.78618 | 45 | 0 | 0.0396929 | 5 |
llama-2-chat:13:ggufv2:Q4_K_M | context | 1.77351 | 45 | 0 | 0.0394113 | 5 |
llama-3-instruct:8:ggufv2:Q8_0 | context | 1.67334 | 45 | 0 | 0.0371853 | 5 |
llama-3-instruct:8:ggufv2:Q5_K_M | context | 1.64821 | 45 | 0 | 0.0366268 | 5 |
llama-2-chat:13:ggufv2:Q8_0 | context | 1.58821 | 45 | 0 | 0.0352936 | 5 |
llama-3-instruct:8:ggufv2:Q4_K_M | context | 1.57169 | 45 | 0 | 0.0349264 | 5 |
llama-2-chat:13:ggufv2:Q2_K | context | 1.34289 | 45 | 0 | 0.0298419 | 5 |
llama-2-chat:7:ggufv2:Q5_K_M | context | 1.23881 | 45 | 0 | 0.0275291 | 5 |
llama-2-chat:7:ggufv2:Q2_K | context | 1.12335 | 45 | 0 | 0.0249632 | 5 |
llama-3-instruct:8:ggufv2:Q6_K | context | 1.10292 | 45 | 0 | 0.0245094 | 5 |
Full model name | Subtask | Score achieved | Score possible | Score SD | Accuracy | Iterations |
---|---|---|---|---|---|---|
claude-3-5-sonnet-20240620 | disease | 20.4 | 27 | 1.51082e-17 | 0.755556 | 3 |
gpt-4o-mini-2024-07-18 | disease | 32.3333 | 45 | 0 | 0.718519 | 5 |
gpt-4o-2024-08-06 | disease | 19.4 | 27 | 1.51082e-17 | 0.718519 | 3 |
gpt-4-turbo-2024-04-09 | disease | 32.2667 | 45 | 0.00331269 | 0.717037 | 5 |
llama-3.1-instruct:70:ggufv2:IQ4_XS | disease | 17.2 | 27 | 7.55411e-18 | 0.637037 | 3 |
llama-3.1-instruct:70:ggufv2:IQ2_M | disease | 13.2 | 27 | 7.55411e-18 | 0.488889 | 3 |
llama-3.1-instruct:70:ggufv2:Q3_K_S | disease | 13.2 | 27 | 7.55411e-18 | 0.488889 | 3 |
llama-3.1-instruct:8:ggufv2:Q5_K_M | disease | 11.7851 | 27 | 7.55411e-18 | 0.436484 | 3 |
llama-3.1-instruct:8:ggufv2:Q3_K_L | disease | 11.5883 | 27 | 2.36066e-19 | 0.429196 | 3 |
llama-3.1-instruct:8:ggufv2:Q6_K | disease | 11.5414 | 27 | 1.51082e-17 | 0.42746 | 3 |
llama-3.1-instruct:8:ggufv2:IQ4_XS | disease | 11.5414 | 27 | 1.51082e-17 | 0.42746 | 3 |
llama-3.1-instruct:8:ggufv2:Q4_K_M | disease | 10.2531 | 27 | 2.36066e-19 | 0.379746 | 3 |
claude-3-opus-20240229 | disease | 10 | 27 | 0 | 0.37037 | 3 |
llama-3.1-instruct:8:ggufv2:Q8_0 | disease | 9.31558 | 27 | 2.36066e-19 | 0.345022 | 3 |
openhermes-2.5:7:ggufv2:Q3_K_M | disease | 6.46667 | 45 | 0 | 0.143704 | 5 |
openhermes-2.5:7:ggufv2:Q4_K_M | disease | 6.46667 | 45 | 0 | 0.143704 | 5 |
openhermes-2.5:7:ggufv2:Q5_K_M | disease | 6.46667 | 45 | 0 | 0.143704 | 5 |
openhermes-2.5:7:ggufv2:Q6_K | disease | 6.46667 | 45 | 0 | 0.143704 | 5 |
openhermes-2.5:7:ggufv2:Q8_0 | disease | 6.46667 | 45 | 0 | 0.143704 | 5 |
gpt-4-0125-preview | disease | 6.21333 | 45 | 0 | 0.138074 | 5 |
gpt-4o-2024-05-13 | disease | 6.2 | 45 | 0 | 0.137778 | 5 |
gpt-4-0613 | disease | 6.13333 | 45 | 0 | 0.136296 | 5 |
gpt-3.5-turbo-0613 | disease | 6.06667 | 45 | 0 | 0.134815 | 5 |
gpt-3.5-turbo-0125 | disease | 4.75238 | 45 | 0 | 0.105608 | 5 |
openhermes-2.5:7:ggufv2:Q2_K | disease | 4.32493 | 45 | 0 | 0.0961096 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q2_K | disease | 4.20708 | 45 | 0 | 0.0934906 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q6_K | disease | 4.14674 | 45 | 0 | 0.0921497 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q5_K_M | disease | 4.02927 | 45 | 0 | 0.0895392 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q6_K | disease | 4.01581 | 45 | 0 | 0.0892402 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q8_0 | disease | 3.47244 | 45 | 0 | 0.0771654 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q4_K_M | disease | 3.04532 | 45 | 0 | 0.0676737 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q5_K_M | disease | 2.92854 | 45 | 0 | 0.0650787 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q4_K_M | disease | 2.65437 | 45 | 0 | 0.0589859 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q8_0 | disease | 2.57657 | 45 | 0 | 0.057257 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q3_K_M | disease | 2.44785 | 45 | 0 | 0.0543966 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q3_K_M | disease | 2.29171 | 45 | 0 | 0.0509269 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q2_K | disease | 2.29094 | 45 | 0 | 0.0509099 | 5 |
llama-3-instruct:8:ggufv2:Q8_0 | disease | 1.73452 | 45 | 0 | 0.0385449 | 5 |
llama-3-instruct:8:ggufv2:Q6_K | disease | 1.73452 | 45 | 0 | 0.0385449 | 5 |
llama-3-instruct:8:ggufv2:Q5_K_M | disease | 1.73452 | 45 | 0 | 0.0385449 | 5 |
llama-2-chat:13:ggufv2:Q6_K | disease | 0.827524 | 27 | 0 | 0.030649 | 3 |
code-llama-instruct:7:ggufv2:Q4_K_M | disease | 1.33093 | 45 | 0 | 0.0295762 | 5 |
chatglm3:6:ggmlv3:q4_0 | disease | 1.21669 | 45 | 0 | 0.0270376 | 5 |
llama-3-instruct:8:ggufv2:Q4_K_M | disease | 0.995894 | 45 | 0 | 0.022131 | 5 |
llama-2-chat:7:ggufv2:Q8_0 | disease | 0.444887 | 27 | 2.36066e-19 | 0.0164773 | 3 |
llama-2-chat:7:ggufv2:Q6_K | disease | 0.439254 | 27 | 0 | 0.0162687 | 3 |
llama-2-chat:13:ggufv2:Q5_K_M | disease | 0.306386 | 45 | 0 | 0.00680858 | 5 |
llama-2-chat:13:ggufv2:Q8_0 | disease | 0.26663 | 45 | 0 | 0.00592511 | 5 |
llama-2-chat:13:ggufv2:Q4_K_M | disease | 0.250053 | 45 | 0 | 0.00555673 | 5 |
llama-2-chat:70:ggufv2:Q5_K_M | disease | 0.235648 | 45 | 0 | 0.00523663 | 5 |
llama-2-chat:7:ggufv2:Q3_K_M | disease | 0.185035 | 45 | 0 | 0.0041119 | 5 |
llama-2-chat:70:ggufv2:Q2_K | disease | 0.182046 | 45 | 0 | 0.00404548 | 5 |
llama-2-chat:70:ggufv2:Q4_K_M | disease | 0.179398 | 45 | 0 | 0.00398663 | 5 |
llama-2-chat:7:ggufv2:Q5_K_M | disease | 0.150208 | 45 | 0 | 0.00333795 | 5 |
llama-2-chat:70:ggufv2:Q3_K_M | disease | 0.142957 | 45 | 0 | 0.00317683 | 5 |
llama-2-chat:13:ggufv2:Q3_K_M | disease | 0.103277 | 45 | 0 | 0.00229505 | 5 |
llama-2-chat:7:ggufv2:Q4_K_M | disease | 0.0898052 | 45 | 0 | 0.00199567 | 5 |
llama-2-chat:13:ggufv2:Q2_K | disease | 0.0874203 | 45 | 0 | 0.00194267 | 5 |
llama-2-chat:7:ggufv2:Q2_K | disease | 0.0587138 | 45 | 0 | 0.00130475 | 5 |
Full model name | Subtask | Score achieved | Score possible | Score SD | Accuracy | Iterations |
---|---|---|---|---|---|---|
gpt-4o-2024-08-06 | entity | 17.9286 | 27 | 0 | 0.664021 | 3 |
llama-3.1-instruct:70:ggufv2:Q3_K_S | entity | 17.8096 | 27 | 1.51082e-17 | 0.659615 | 3 |
llama-3.1-instruct:70:ggufv2:IQ4_XS | entity | 17.6096 | 27 | 7.55411e-18 | 0.652208 | 3 |
claude-3-opus-20240229 | entity | 16.325 | 27 | 4.53247e-17 | 0.60463 | 3 |
llama-3.1-instruct:70:ggufv2:IQ2_M | entity | 16.225 | 27 | 1.51082e-17 | 0.600926 | 3 |
claude-3-5-sonnet-20240620 | entity | 15.625 | 27 | 1.51082e-17 | 0.578704 | 3 |
gpt-4o-mini-2024-07-18 | entity | 24.5545 | 45 | 0.0661682 | 0.545656 | 5 |
gpt-4-turbo-2024-04-09 | entity | 19.2455 | 45 | 0.0562119 | 0.427677 | 5 |
llama-3.1-instruct:8:ggufv2:Q8_0 | entity | 8.02772 | 27 | 0 | 0.297323 | 3 |
llama-3.1-instruct:8:ggufv2:Q6_K | entity | 7.84022 | 27 | 0 | 0.290379 | 3 |
llama-3.1-instruct:8:ggufv2:IQ4_XS | entity | 6.80681 | 27 | 1.51082e-17 | 0.252104 | 3 |
llama-3.1-instruct:8:ggufv2:Q3_K_L | entity | 6.16922 | 27 | 1.69967e-17 | 0.22849 | 3 |
llama-3.1-instruct:8:ggufv2:Q4_K_M | entity | 5.96677 | 27 | 7.55411e-18 | 0.220991 | 3 |
llama-3.1-instruct:8:ggufv2:Q5_K_M | entity | 5.58394 | 27 | 0 | 0.206813 | 3 |
gpt-4o-2024-05-13 | entity | 5.9909 | 45 | 0 | 0.133131 | 5 |
gpt-4-0125-preview | entity | 4.59502 | 45 | 0 | 0.102112 | 5 |
gpt-3.5-turbo-0613 | entity | 4.57972 | 45 | 0 | 0.101772 | 5 |
openhermes-2.5:7:ggufv2:Q4_K_M | entity | 4.22461 | 45 | 0 | 0.0938803 | 5 |
openhermes-2.5:7:ggufv2:Q8_0 | entity | 4.1344 | 45 | 0 | 0.0918755 | 5 |
gpt-4-0613 | entity | 4.12852 | 45 | 0 | 0.0917448 | 5 |
openhermes-2.5:7:ggufv2:Q6_K | entity | 4.09333 | 45 | 0 | 0.0909629 | 5 |
openhermes-2.5:7:ggufv2:Q5_K_M | entity | 4.02016 | 45 | 0 | 0.0893369 | 5 |
gpt-3.5-turbo-0125 | entity | 3.71195 | 45 | 0 | 0.0824877 | 5 |
openhermes-2.5:7:ggufv2:Q3_K_M | entity | 3.65819 | 45 | 0 | 0.0812932 | 5 |
llama-2-chat:13:ggufv2:Q6_K | entity | 2.14189 | 27 | 0 | 0.0793293 | 3 |
llama-2-chat:7:ggufv2:Q6_K | entity | 2.07106 | 27 | 9.44264e-19 | 0.0767059 | 3 |
llama-2-chat:7:ggufv2:Q8_0 | entity | 1.79733 | 27 | 0 | 0.0665678 | 3 |
mixtral-instruct-v0.1:46_7:ggufv2:Q5_K_M | entity | 2.42313 | 45 | 0 | 0.0538473 | 5 |
openhermes-2.5:7:ggufv2:Q2_K | entity | 2.33413 | 45 | 0 | 0.0518696 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q3_K_M | entity | 2.30597 | 45 | 0 | 0.0512437 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q3_K_M | entity | 2.20283 | 45 | 0 | 0.0489518 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q6_K | entity | 2.10077 | 45 | 0 | 0.0466838 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q4_K_M | entity | 2.0607 | 45 | 0 | 0.0457934 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q8_0 | entity | 2.00802 | 45 | 0 | 0.0446226 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q5_K_M | entity | 1.99809 | 45 | 0 | 0.044402 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q6_K | entity | 1.99214 | 45 | 0 | 0.0442699 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q8_0 | entity | 1.79999 | 45 | 0 | 0.0399998 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q2_K | entity | 1.77563 | 45 | 0 | 0.0394584 | 5 |
chatglm3:6:ggmlv3:q4_0 | entity | 1.22227 | 45 | 0 | 0.0271617 | 5 |
llama-2-chat:70:ggufv2:Q3_K_M | entity | 1.20851 | 45 | 0 | 0.0268558 | 5 |
llama-2-chat:70:ggufv2:Q2_K | entity | 1.16189 | 45 | 0 | 0.0258197 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q4_K_M | entity | 1.10007 | 45 | 0 | 0.0244461 | 5 |
llama-2-chat:70:ggufv2:Q4_K_M | entity | 1.01555 | 45 | 0 | 0.0225677 | 5 |
code-llama-instruct:7:ggufv2:Q4_K_M | entity | 0.948961 | 45 | 0 | 0.021088 | 5 |
llama-2-chat:70:ggufv2:Q5_K_M | entity | 0.903324 | 45 | 0 | 0.0200739 | 5 |
llama-2-chat:13:ggufv2:Q2_K | entity | 0.807379 | 45 | 0 | 0.0179418 | 5 |
llama-2-chat:13:ggufv2:Q4_K_M | entity | 0.785233 | 45 | 0 | 0.0174496 | 5 |
llama-3-instruct:8:ggufv2:Q5_K_M | entity | 0.75253 | 45 | 0 | 0.0167229 | 5 |
llama-3-instruct:8:ggufv2:Q6_K | entity | 0.749495 | 45 | 0 | 0.0166554 | 5 |
llama-2-chat:7:ggufv2:Q3_K_M | entity | 0.699988 | 45 | 0 | 0.0155553 | 5 |
llama-3-instruct:8:ggufv2:Q8_0 | entity | 0.695524 | 45 | 0 | 0.0154561 | 5 |
llama-3-instruct:8:ggufv2:Q4_K_M | entity | 0.694377 | 45 | 0 | 0.0154306 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q2_K | entity | 0.685368 | 45 | 0 | 0.0152304 | 5 |
llama-2-chat:7:ggufv2:Q4_K_M | entity | 0.685027 | 45 | 0 | 0.0152228 | 5 |
llama-2-chat:13:ggufv2:Q8_0 | entity | 0.629764 | 45 | 0 | 0.0139947 | 5 |
llama-2-chat:7:ggufv2:Q5_K_M | entity | 0.623851 | 45 | 0 | 0.0138634 | 5 |
llama-2-chat:13:ggufv2:Q5_K_M | entity | 0.623813 | 45 | 0 | 0.0138625 | 5 |
llama-2-chat:13:ggufv2:Q3_K_M | entity | 0.56502 | 45 | 0 | 0.012556 | 5 |
llama-2-chat:7:ggufv2:Q2_K | entity | 0.318196 | 45 | 0 | 0.00707101 | 5 |
Full model name | Subtask | Score achieved | Score possible | Score SD | Accuracy | Iterations |
---|---|---|---|---|---|---|
claude-3-opus-20240229 | experiment_yes_or_no | 27 | 27 | 0 | 1 | 3 |
gpt-4-turbo-2024-04-09 | experiment_yes_or_no | 45 | 45 | 0 | 1 | 5 |
llama-3.1-instruct:70:ggufv2:IQ4_XS | experiment_yes_or_no | 27 | 27 | 0 | 1 | 3 |
llama-3.1-instruct:70:ggufv2:Q3_K_S | experiment_yes_or_no | 27 | 27 | 0 | 1 | 3 |
claude-3-5-sonnet-20240620 | experiment_yes_or_no | 27 | 27 | 0 | 1 | 3 |
gpt-4o-2024-08-06 | experiment_yes_or_no | 25 | 27 | 0 | 0.925926 | 3 |
gpt-4o-mini-2024-07-18 | experiment_yes_or_no | 40 | 45 | 0 | 0.888889 | 5 |
llama-3.1-instruct:70:ggufv2:IQ2_M | experiment_yes_or_no | 22 | 27 | 0 | 0.814815 | 3 |
llama-3.1-instruct:8:ggufv2:Q6_K | experiment_yes_or_no | 18.0146 | 27 | 0 | 0.667206 | 3 |
llama-3.1-instruct:8:ggufv2:Q4_K_M | experiment_yes_or_no | 18.006 | 27 | 0 | 0.666888 | 3 |
llama-3.1-instruct:8:ggufv2:IQ4_XS | experiment_yes_or_no | 18.0059 | 27 | 0 | 0.666887 | 3 |
llama-3.1-instruct:8:ggufv2:Q8_0 | experiment_yes_or_no | 18.0059 | 27 | 0 | 0.666886 | 3 |
llama-3.1-instruct:8:ggufv2:Q5_K_M | experiment_yes_or_no | 18 | 27 | 0 | 0.666667 | 3 |
llama-3.1-instruct:8:ggufv2:Q3_K_L | experiment_yes_or_no | 18 | 27 | 0 | 0.666667 | 3 |
openhermes-2.5:7:ggufv2:Q2_K | experiment_yes_or_no | 9 | 45 | 0 | 0.2 | 5 |
gpt-4-0125-preview | experiment_yes_or_no | 9 | 45 | 0 | 0.2 | 5 |
llama-2-chat:70:ggufv2:Q4_K_M | experiment_yes_or_no | 9 | 45 | 0 | 0.2 | 5 |
chatglm3:6:ggmlv3:q4_0 | experiment_yes_or_no | 8.6 | 45 | 0 | 0.191111 | 5 |
openhermes-2.5:7:ggufv2:Q5_K_M | experiment_yes_or_no | 8.33333 | 45 | 0 | 0.185185 | 5 |
openhermes-2.5:7:ggufv2:Q6_K | experiment_yes_or_no | 8.33333 | 45 | 0 | 0.185185 | 5 |
openhermes-2.5:7:ggufv2:Q4_K_M | experiment_yes_or_no | 8.33333 | 45 | 0 | 0.185185 | 5 |
llama-2-chat:70:ggufv2:Q5_K_M | experiment_yes_or_no | 8.025 | 45 | 0 | 0.178333 | 5 |
openhermes-2.5:7:ggufv2:Q8_0 | experiment_yes_or_no | 8 | 45 | 0 | 0.177778 | 5 |
gpt-3.5-turbo-0613 | experiment_yes_or_no | 8 | 45 | 0 | 0.177778 | 5 |
gpt-4-0613 | experiment_yes_or_no | 8 | 45 | 0 | 0.177778 | 5 |
openhermes-2.5:7:ggufv2:Q3_K_M | experiment_yes_or_no | 8 | 45 | 0 | 0.177778 | 5 |
gpt-4o-2024-05-13 | experiment_yes_or_no | 8 | 45 | 0 | 0.177778 | 5 |
llama-2-chat:7:ggufv2:Q8_0 | experiment_yes_or_no | 4.67535 | 27 | 0 | 0.173161 | 3 |
llama-2-chat:70:ggufv2:Q2_K | experiment_yes_or_no | 7.05061 | 45 | 0 | 0.15668 | 5 |
llama-2-chat:70:ggufv2:Q3_K_M | experiment_yes_or_no | 6.07336 | 45 | 0 | 0.134964 | 5 |
gpt-3.5-turbo-0125 | experiment_yes_or_no | 6.03333 | 45 | 0 | 0.134074 | 5 |
llama-2-chat:13:ggufv2:Q6_K | experiment_yes_or_no | 3.25916 | 27 | 9.44264e-19 | 0.12071 | 3 |
mistral-instruct-v0.2:7:ggufv2:Q3_K_M | experiment_yes_or_no | 5.23564 | 45 | 0 | 0.116348 | 5 |
llama-2-chat:13:ggufv2:Q3_K_M | experiment_yes_or_no | 5.16593 | 45 | 0 | 0.114799 | 5 |
llama-3-instruct:8:ggufv2:Q8_0 | experiment_yes_or_no | 3.7 | 45 | 0 | 0.0822222 | 5 |
llama-3-instruct:8:ggufv2:Q5_K_M | experiment_yes_or_no | 3.68182 | 45 | 0 | 0.0818182 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q8_0 | experiment_yes_or_no | 3.32028 | 45 | 0 | 0.073784 | 5 |
llama-2-chat:7:ggufv2:Q6_K | experiment_yes_or_no | 1.97565 | 27 | 7.08198e-19 | 0.0731722 | 3 |
mistral-instruct-v0.2:7:ggufv2:Q5_K_M | experiment_yes_or_no | 3.26963 | 45 | 0 | 0.0726584 | 5 |
code-llama-instruct:7:ggufv2:Q4_K_M | experiment_yes_or_no | 3.0913 | 45 | 0 | 0.0686956 | 5 |
llama-3-instruct:8:ggufv2:Q6_K | experiment_yes_or_no | 2.36364 | 45 | 0 | 0.0525253 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q6_K | experiment_yes_or_no | 2.36015 | 45 | 0 | 0.0524479 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q4_K_M | experiment_yes_or_no | 2.2851 | 45 | 0 | 0.05078 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q2_K | experiment_yes_or_no | 2.2802 | 45 | 0 | 0.0506711 | 5 |
llama-2-chat:7:ggufv2:Q4_K_M | experiment_yes_or_no | 2.06817 | 45 | 0 | 0.0459593 | 5 |
llama-3-instruct:8:ggufv2:Q4_K_M | experiment_yes_or_no | 1.89935 | 45 | 0 | 0.0422078 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q3_K_M | experiment_yes_or_no | 1.45686 | 45 | 0 | 0.0323746 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q4_K_M | experiment_yes_or_no | 1.29991 | 45 | 0 | 0.0288868 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q5_K_M | experiment_yes_or_no | 1.1661 | 45 | 0 | 0.0259134 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q6_K | experiment_yes_or_no | 1.15184 | 45 | 0 | 0.0255965 | 5 |
llama-2-chat:13:ggufv2:Q8_0 | experiment_yes_or_no | 1.06643 | 45 | 0 | 0.0236984 | 5 |
llama-2-chat:13:ggufv2:Q5_K_M | experiment_yes_or_no | 1.03147 | 45 | 0 | 0.0229215 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q2_K | experiment_yes_or_no | 0.785587 | 45 | 0 | 0.0174575 | 5 |
llama-2-chat:7:ggufv2:Q3_K_M | experiment_yes_or_no | 0.726745 | 45 | 0 | 0.0161499 | 5 |
llama-2-chat:7:ggufv2:Q5_K_M | experiment_yes_or_no | 0.618798 | 45 | 0 | 0.0137511 | 5 |
llama-2-chat:13:ggufv2:Q4_K_M | experiment_yes_or_no | 0.468722 | 45 | 0 | 0.010416 | 5 |
llama-2-chat:13:ggufv2:Q2_K | experiment_yes_or_no | 0.267272 | 45 | 0 | 0.00593938 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q8_0 | experiment_yes_or_no | 0.201489 | 45 | 0 | 0.00447753 | 5 |
llama-2-chat:7:ggufv2:Q2_K | experiment_yes_or_no | 0.130285 | 45 | 0 | 0.00289522 | 5 |
Full model name | Subtask | Score achieved | Score possible | Score SD | Accuracy | Iterations |
---|---|---|---|---|---|---|
gpt-4o-mini-2024-07-18 | hypothesis | 17.2242 | 45 | 0.0552888 | 0.382759 | 5 |
gpt-4-turbo-2024-04-09 | hypothesis | 16.3215 | 45 | 0.154249 | 0.3627 | 5 |
gpt-4o-2024-08-06 | hypothesis | 8.73977 | 27 | 0.0621005 | 0.323695 | 3 |
llama-3.1-instruct:70:ggufv2:IQ4_XS | hypothesis | 6.99218 | 27 | 0 | 0.25897 | 3 |
claude-3-opus-20240229 | hypothesis | 6.74202 | 27 | 0.0355843 | 0.249704 | 3 |
llama-3.1-instruct:70:ggufv2:Q3_K_S | hypothesis | 5.51964 | 27 | 4.72132e-18 | 0.204431 | 3 |
claude-3-5-sonnet-20240620 | hypothesis | 4.76919 | 27 | 0.000375971 | 0.176637 | 3 |
llama-3.1-instruct:8:ggufv2:IQ4_XS | hypothesis | 4.4134 | 27 | 0 | 0.163459 | 3 |
llama-3.1-instruct:70:ggufv2:IQ2_M | hypothesis | 4.27846 | 27 | 0 | 0.158462 | 3 |
llama-3.1-instruct:8:ggufv2:Q3_K_L | hypothesis | 4.14868 | 27 | 0 | 0.153655 | 3 |
llama-3.1-instruct:8:ggufv2:Q5_K_M | hypothesis | 3.48866 | 27 | 3.55023e-05 | 0.129209 | 3 |
llama-2-chat:7:ggufv2:Q8_0 | hypothesis | 2.854 | 27 | 3.77706e-18 | 0.105704 | 3 |
llama-3.1-instruct:8:ggufv2:Q4_K_M | hypothesis | 2.74519 | 27 | 1.88853e-18 | 0.101674 | 3 |
llama-3.1-instruct:8:ggufv2:Q8_0 | hypothesis | 2.70116 | 27 | 0 | 0.100043 | 3 |
llama-3.1-instruct:8:ggufv2:Q6_K | hypothesis | 2.64133 | 27 | 9.44264e-19 | 0.097827 | 3 |
mistral-instruct-v0.2:7:ggufv2:Q4_K_M | hypothesis | 3.67339 | 45 | 0 | 0.0816309 | 5 |
llama-2-chat:7:ggufv2:Q6_K | hypothesis | 2.01944 | 27 | 5.19345e-18 | 0.074794 | 3 |
mistral-instruct-v0.2:7:ggufv2:Q6_K | hypothesis | 3.33681 | 45 | 0 | 0.0741512 | 5 |
gpt-4-0613 | hypothesis | 3.29696 | 45 | 0 | 0.0732657 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q8_0 | hypothesis | 2.9272 | 45 | 0 | 0.0650489 | 5 |
gpt-4o-2024-05-13 | hypothesis | 2.89512 | 45 | 0 | 0.064336 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q5_K_M | hypothesis | 2.75585 | 45 | 0 | 0.0612411 | 5 |
gpt-3.5-turbo-0125 | hypothesis | 2.72775 | 45 | 0 | 0.0606168 | 5 |
llama-2-chat:13:ggufv2:Q6_K | hypothesis | 1.61253 | 27 | 1.88853e-18 | 0.0597233 | 3 |
gpt-3.5-turbo-0613 | hypothesis | 2.64497 | 45 | 0 | 0.0587771 | 5 |
openhermes-2.5:7:ggufv2:Q4_K_M | hypothesis | 2.57382 | 45 | 0 | 0.0571961 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q3_K_M | hypothesis | 2.47292 | 45 | 0 | 0.0549539 | 5 |
openhermes-2.5:7:ggufv2:Q8_0 | hypothesis | 2.37196 | 45 | 0 | 0.0527103 | 5 |
gpt-4-0125-preview | hypothesis | 2.33518 | 45 | 0 | 0.051893 | 5 |
openhermes-2.5:7:ggufv2:Q6_K | hypothesis | 2.29085 | 45 | 0 | 0.0509077 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q3_K_M | hypothesis | 2.23255 | 45 | 0 | 0.0496122 | 5 |
openhermes-2.5:7:ggufv2:Q3_K_M | hypothesis | 2.09626 | 45 | 0 | 0.0465835 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q2_K | hypothesis | 2.05375 | 45 | 0 | 0.045639 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q4_K_M | hypothesis | 1.87442 | 45 | 0 | 0.0416537 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q8_0 | hypothesis | 1.83735 | 45 | 0 | 0.04083 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q5_K_M | hypothesis | 1.71557 | 45 | 0 | 0.0381237 | 5 |
openhermes-2.5:7:ggufv2:Q5_K_M | hypothesis | 1.52181 | 45 | 0 | 0.033818 | 5 |
openhermes-2.5:7:ggufv2:Q2_K | hypothesis | 1.4915 | 45 | 0 | 0.0331444 | 5 |
llama-2-chat:70:ggufv2:Q3_K_M | hypothesis | 1.44143 | 45 | 0 | 0.0320317 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q2_K | hypothesis | 1.44009 | 45 | 0 | 0.032002 | 5 |
llama-2-chat:70:ggufv2:Q2_K | hypothesis | 1.4389 | 45 | 0 | 0.0319755 | 5 |
llama-2-chat:70:ggufv2:Q4_K_M | hypothesis | 1.41421 | 45 | 0 | 0.0314268 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q6_K | hypothesis | 1.39565 | 45 | 0 | 0.0310144 | 5 |
llama-3-instruct:8:ggufv2:Q4_K_M | hypothesis | 1.13596 | 45 | 0 | 0.0252436 | 5 |
chatglm3:6:ggmlv3:q4_0 | hypothesis | 0.98676 | 45 | 0 | 0.021928 | 5 |
llama-3-instruct:8:ggufv2:Q8_0 | hypothesis | 0.878406 | 45 | 0 | 0.0195201 | 5 |
llama-3-instruct:8:ggufv2:Q6_K | hypothesis | 0.876219 | 45 | 0 | 0.0194715 | 5 |
llama-2-chat:7:ggufv2:Q5_K_M | hypothesis | 0.68638 | 45 | 0 | 0.0152529 | 5 |
llama-2-chat:70:ggufv2:Q5_K_M | hypothesis | 0.623758 | 45 | 0 | 0.0138613 | 5 |
llama-2-chat:7:ggufv2:Q4_K_M | hypothesis | 0.62053 | 45 | 0 | 0.0137896 | 5 |
llama-3-instruct:8:ggufv2:Q5_K_M | hypothesis | 0.604423 | 45 | 0 | 0.0134316 | 5 |
code-llama-instruct:7:ggufv2:Q4_K_M | hypothesis | 0.572369 | 45 | 0 | 0.0127193 | 5 |
llama-2-chat:13:ggufv2:Q8_0 | hypothesis | 0.55524 | 45 | 0 | 0.0123387 | 5 |
llama-2-chat:7:ggufv2:Q2_K | hypothesis | 0.520453 | 45 | 0 | 0.0115656 | 5 |
llama-2-chat:13:ggufv2:Q2_K | hypothesis | 0.49279 | 45 | 0 | 0.0109509 | 5 |
llama-2-chat:13:ggufv2:Q3_K_M | hypothesis | 0.424638 | 45 | 0 | 0.00943639 | 5 |
llama-2-chat:13:ggufv2:Q5_K_M | hypothesis | 0.408017 | 45 | 0 | 0.00906704 | 5 |
llama-2-chat:7:ggufv2:Q3_K_M | hypothesis | 0.402337 | 45 | 0 | 0.00894082 | 5 |
llama-2-chat:13:ggufv2:Q4_K_M | hypothesis | 0.366299 | 45 | 0 | 0.00813997 | 5 |
Full model name | Subtask | Score achieved | Score possible | Score SD | Accuracy | Iterations |
---|---|---|---|---|---|---|
llama-3.1-instruct:70:ggufv2:Q3_K_S | intervention | 21.7143 | 27 | 0 | 0.804233 | 3 |
llama-3.1-instruct:70:ggufv2:IQ4_XS | intervention | 20.4286 | 27 | 0 | 0.756614 | 3 |
llama-3.1-instruct:70:ggufv2:IQ2_M | intervention | 20.4286 | 27 | 0 | 0.756614 | 3 |
gpt-4o-mini-2024-07-18 | intervention | 30.0404 | 45 | 0.0608581 | 0.667565 | 5 |
gpt-4o-2024-08-06 | intervention | 17.3762 | 27 | 0.0687322 | 0.643563 | 3 |
llama-3.1-instruct:8:ggufv2:Q8_0 | intervention | 14.9465 | 27 | 0 | 0.553575 | 3 |
claude-3-opus-20240229 | intervention | 14.8366 | 27 | 0 | 0.549502 | 3 |
claude-3-5-sonnet-20240620 | intervention | 14.533 | 27 | 0.00814604 | 0.538261 | 3 |
gpt-4-turbo-2024-04-09 | intervention | 23.5746 | 45 | 0.0579382 | 0.52388 | 5 |
llama-3.1-instruct:8:ggufv2:Q5_K_M | intervention | 13.2322 | 27 | 0 | 0.490083 | 3 |
llama-3.1-instruct:8:ggufv2:IQ4_XS | intervention | 12.6108 | 27 | 0 | 0.467065 | 3 |
llama-3.1-instruct:8:ggufv2:Q6_K | intervention | 11.4946 | 27 | 0 | 0.425726 | 3 |
llama-3.1-instruct:8:ggufv2:Q4_K_M | intervention | 11.4467 | 27 | 0 | 0.423953 | 3 |
llama-3.1-instruct:8:ggufv2:Q3_K_L | intervention | 7.23948 | 27 | 1.88853e-18 | 0.268129 | 3 |
gpt-4o-2024-05-13 | intervention | 5.34631 | 45 | 0 | 0.118807 | 5 |
openhermes-2.5:7:ggufv2:Q4_K_M | intervention | 4.9841 | 45 | 0 | 0.110758 | 5 |
gpt-4-0125-preview | intervention | 4.92171 | 45 | 0 | 0.109371 | 5 |
gpt-4-0613 | intervention | 4.72253 | 45 | 0 | 0.104945 | 5 |
openhermes-2.5:7:ggufv2:Q6_K | intervention | 4.71449 | 45 | 0 | 0.104767 | 5 |
openhermes-2.5:7:ggufv2:Q8_0 | intervention | 4.44465 | 45 | 0 | 0.09877 | 5 |
gpt-3.5-turbo-0613 | intervention | 4.27143 | 45 | 0 | 0.0949206 | 5 |
openhermes-2.5:7:ggufv2:Q5_K_M | intervention | 4.00021 | 45 | 0 | 0.0888935 | 5 |
gpt-3.5-turbo-0125 | intervention | 3.75141 | 45 | 0 | 0.0833647 | 5 |
openhermes-2.5:7:ggufv2:Q3_K_M | intervention | 3.55238 | 45 | 0 | 0.0789418 | 5 |
openhermes-2.5:7:ggufv2:Q2_K | intervention | 2.92766 | 45 | 0 | 0.0650591 | 5 |
llama-2-chat:7:ggufv2:Q8_0 | intervention | 1.56417 | 27 | 0 | 0.0579323 | 3 |
mixtral-instruct-v0.1:46_7:ggufv2:Q6_K | intervention | 2.23683 | 45 | 0 | 0.0497073 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q5_K_M | intervention | 2.23319 | 45 | 0 | 0.0496264 | 5 |
llama-2-chat:13:ggufv2:Q6_K | intervention | 1.13241 | 27 | 0.000274145 | 0.0419412 | 3 |
mistral-instruct-v0.2:7:ggufv2:Q3_K_M | intervention | 1.66677 | 45 | 0 | 0.0370393 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q4_K_M | intervention | 1.23412 | 45 | 0 | 0.0274249 | 5 |
code-llama-instruct:7:ggufv2:Q4_K_M | intervention | 1.17173 | 45 | 0 | 0.0260384 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q5_K_M | intervention | 1.15754 | 45 | 0 | 0.025723 | 5 |
llama-2-chat:13:ggufv2:Q4_K_M | intervention | 1.02157 | 45 | 0 | 0.0227015 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q8_0 | intervention | 0.987919 | 45 | 0 | 0.0219538 | 5 |
chatglm3:6:ggmlv3:q4_0 | intervention | 0.881806 | 45 | 0 | 0.0195957 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q8_0 | intervention | 0.879646 | 45 | 0 | 0.0195477 | 5 |
llama-2-chat:7:ggufv2:Q6_K | intervention | 0.514286 | 27 | 3.77706e-18 | 0.0190476 | 3 |
mixtral-instruct-v0.1:46_7:ggufv2:Q4_K_M | intervention | 0.723791 | 45 | 0 | 0.0160842 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q2_K | intervention | 0.680182 | 45 | 0 | 0.0151152 | 5 |
llama-2-chat:70:ggufv2:Q2_K | intervention | 0.668995 | 45 | 0 | 0.0148666 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q6_K | intervention | 0.640258 | 45 | 0 | 0.0142279 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q3_K_M | intervention | 0.550643 | 45 | 0 | 0.0122365 | 5 |
llama-2-chat:70:ggufv2:Q5_K_M | intervention | 0.542302 | 45 | 0 | 0.0120512 | 5 |
llama-2-chat:13:ggufv2:Q2_K | intervention | 0.502722 | 45 | 0 | 0.0111716 | 5 |
llama-2-chat:70:ggufv2:Q4_K_M | intervention | 0.417501 | 45 | 0 | 0.00927779 | 5 |
llama-2-chat:7:ggufv2:Q3_K_M | intervention | 0.416756 | 45 | 0 | 0.00926124 | 5 |
llama-3-instruct:8:ggufv2:Q5_K_M | intervention | 0.410888 | 45 | 0 | 0.00913085 | 5 |
llama-2-chat:70:ggufv2:Q3_K_M | intervention | 0.402319 | 45 | 0 | 0.00894042 | 5 |
llama-3-instruct:8:ggufv2:Q4_K_M | intervention | 0.37923 | 45 | 0 | 0.00842733 | 5 |
llama-2-chat:13:ggufv2:Q5_K_M | intervention | 0.339683 | 45 | 0 | 0.0075485 | 5 |
llama-3-instruct:8:ggufv2:Q6_K | intervention | 0.327257 | 45 | 0 | 0.00727237 | 5 |
llama-3-instruct:8:ggufv2:Q8_0 | intervention | 0.319187 | 45 | 0 | 0.00709304 | 5 |
llama-2-chat:13:ggufv2:Q3_K_M | intervention | 0.265476 | 45 | 0 | 0.00589947 | 5 |
llama-2-chat:7:ggufv2:Q5_K_M | intervention | 0.24986 | 45 | 0 | 0.00555244 | 5 |
llama-2-chat:13:ggufv2:Q8_0 | intervention | 0.244444 | 45 | 0 | 0.0054321 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q2_K | intervention | 0.2273 | 45 | 0 | 0.0050511 | 5 |
llama-2-chat:7:ggufv2:Q2_K | intervention | 0.118691 | 45 | 0 | 0.00263758 | 5 |
llama-2-chat:7:ggufv2:Q4_K_M | intervention | 0.0769231 | 45 | 0 | 0.0017094 | 5 |
Full model name | Subtask | Score achieved | Score possible | Score SD | Accuracy | Iterations |
---|---|---|---|---|---|---|
claude-3-5-sonnet-20240620 | ncbi_link | 24 | 27 | 0 | 0.888889 | 3 |
claude-3-opus-20240229 | ncbi_link | 20.88 | 27 | 1.51082e-17 | 0.773333 | 3 |
gpt-4o-2024-08-06 | ncbi_link | 15.6389 | 27 | 0.00890973 | 0.579218 | 3 |
llama-3.1-instruct:70:ggufv2:IQ4_XS | ncbi_link | 15.3 | 27 | 0 | 0.566667 | 3 |
gpt-4-turbo-2024-04-09 | ncbi_link | 23.3333 | 45 | 0.132508 | 0.518519 | 5 |
llama-3.1-instruct:70:ggufv2:IQ2_M | ncbi_link | 13.5 | 27 | 0 | 0.5 | 3 |
gpt-4o-mini-2024-07-18 | ncbi_link | 16.8619 | 45 | 0.00496904 | 0.374709 | 5 |
llama-3.1-instruct:70:ggufv2:Q3_K_S | ncbi_link | 9.5 | 27 | 0 | 0.351852 | 3 |
gpt-4-0125-preview | ncbi_link | 6.48768 | 45 | 0 | 0.144171 | 5 |
llama-3.1-instruct:8:ggufv2:IQ4_XS | ncbi_link | 3.86536 | 27 | 0 | 0.143161 | 3 |
gpt-4-0613 | ncbi_link | 6.05933 | 45 | 0 | 0.134652 | 5 |
llama-3.1-instruct:8:ggufv2:Q8_0 | ncbi_link | 2.32755 | 27 | 7.55411e-18 | 0.0862055 | 3 |
openhermes-2.5:7:ggufv2:Q6_K | ncbi_link | 3.5303 | 45 | 0 | 0.0784512 | 5 |
openhermes-2.5:7:ggufv2:Q8_0 | ncbi_link | 3.5303 | 45 | 0 | 0.0784512 | 5 |
gpt-4o-2024-05-13 | ncbi_link | 3.51302 | 45 | 0 | 0.078067 | 5 |
openhermes-2.5:7:ggufv2:Q5_K_M | ncbi_link | 3.47436 | 45 | 0 | 0.077208 | 5 |
llama-3.1-instruct:8:ggufv2:Q4_K_M | ncbi_link | 1.90181 | 27 | 7.91155e-07 | 0.0704374 | 3 |
openhermes-2.5:7:ggufv2:Q4_K_M | ncbi_link | 3.11111 | 45 | 0 | 0.0691358 | 5 |
llama-3.1-instruct:8:ggufv2:Q5_K_M | ncbi_link | 1.59127 | 27 | 0 | 0.0589359 | 3 |
openhermes-2.5:7:ggufv2:Q3_K_M | ncbi_link | 2.37436 | 45 | 0 | 0.0527635 | 5 |
llama-3.1-instruct:8:ggufv2:Q6_K | ncbi_link | 1.31 | 27 | 0 | 0.0485184 | 3 |
gpt-3.5-turbo-0613 | ncbi_link | 2.16667 | 45 | 0 | 0.0481481 | 5 |
gpt-3.5-turbo-0125 | ncbi_link | 1.42925 | 45 | 0 | 0.031761 | 5 |
llama-2-chat:13:ggufv2:Q6_K | ncbi_link | 0.690904 | 27 | 0 | 0.0255891 | 3 |
mixtral-instruct-v0.1:46_7:ggufv2:Q4_K_M | ncbi_link | 1.03429 | 45 | 0 | 0.0229841 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q5_K_M | ncbi_link | 0.884957 | 45 | 0 | 0.0196657 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q2_K | ncbi_link | 0.881705 | 45 | 0 | 0.0195934 | 5 |
llama-2-chat:7:ggufv2:Q6_K | ncbi_link | 0.507313 | 27 | 9.44264e-19 | 0.0187894 | 3 |
llama-3.1-instruct:8:ggufv2:Q3_K_L | ncbi_link | 0.486291 | 27 | 2.95082e-19 | 0.0180108 | 3 |
mistral-instruct-v0.2:7:ggufv2:Q5_K_M | ncbi_link | 0.710989 | 45 | 0 | 0.0157998 | 5 |
llama-2-chat:7:ggufv2:Q8_0 | ncbi_link | 0.410766 | 27 | 9.44264e-19 | 0.0152135 | 3 |
mixtral-instruct-v0.1:46_7:ggufv2:Q2_K | ncbi_link | 0.656812 | 45 | 0 | 0.0145958 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q4_K_M | ncbi_link | 0.615714 | 45 | 0 | 0.0136825 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q8_0 | ncbi_link | 0.596131 | 45 | 0 | 0.0132474 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q3_K_M | ncbi_link | 0.574422 | 45 | 0 | 0.0127649 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q6_K | ncbi_link | 0.558824 | 45 | 0 | 0.0124183 | 5 |
openhermes-2.5:7:ggufv2:Q2_K | ncbi_link | 0.505458 | 45 | 0 | 0.0112324 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q8_0 | ncbi_link | 0.429927 | 45 | 0 | 0.00955394 | 5 |
code-llama-instruct:7:ggufv2:Q4_K_M | ncbi_link | 0.328564 | 45 | 0 | 0.00730142 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q6_K | ncbi_link | 0.271548 | 45 | 0 | 0.0060344 | 5 |
llama-2-chat:13:ggufv2:Q8_0 | ncbi_link | 0.255217 | 45 | 0 | 0.00567148 | 5 |
llama-2-chat:70:ggufv2:Q2_K | ncbi_link | 0.253735 | 45 | 0 | 0.00563856 | 5 |
llama-2-chat:13:ggufv2:Q4_K_M | ncbi_link | 0.246231 | 45 | 0 | 0.00547179 | 5 |
llama-2-chat:70:ggufv2:Q4_K_M | ncbi_link | 0.241357 | 45 | 0 | 0.00536348 | 5 |
llama-2-chat:13:ggufv2:Q5_K_M | ncbi_link | 0.236802 | 45 | 0 | 0.00526226 | 5 |
llama-3-instruct:8:ggufv2:Q4_K_M | ncbi_link | 0.233815 | 45 | 0 | 0.00519589 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q3_K_M | ncbi_link | 0.230909 | 45 | 0 | 0.00513131 | 5 |
llama-2-chat:7:ggufv2:Q4_K_M | ncbi_link | 0.216341 | 45 | 0 | 0.00480757 | 5 |
llama-2-chat:70:ggufv2:Q5_K_M | ncbi_link | 0.196981 | 45 | 0 | 0.00437735 | 5 |
llama-2-chat:13:ggufv2:Q2_K | ncbi_link | 0.192574 | 45 | 0 | 0.00427942 | 5 |
llama-3-instruct:8:ggufv2:Q8_0 | ncbi_link | 0.179211 | 45 | 0 | 0.00398247 | 5 |
llama-2-chat:7:ggufv2:Q3_K_M | ncbi_link | 0.177339 | 45 | 0 | 0.00394087 | 5 |
llama-3-instruct:8:ggufv2:Q6_K | ncbi_link | 0.173014 | 45 | 0 | 0.00384476 | 5 |
llama-2-chat:7:ggufv2:Q5_K_M | ncbi_link | 0.170952 | 45 | 0 | 0.00379894 | 5 |
llama-2-chat:70:ggufv2:Q3_K_M | ncbi_link | 0.166777 | 45 | 0 | 0.00370615 | 5 |
llama-3-instruct:8:ggufv2:Q5_K_M | ncbi_link | 0.166614 | 45 | 0 | 0.00370254 | 5 |
llama-2-chat:7:ggufv2:Q2_K | ncbi_link | 0.15271 | 45 | 0 | 0.00339354 | 5 |
llama-2-chat:13:ggufv2:Q3_K_M | ncbi_link | 0.150011 | 45 | 0 | 0.00333359 | 5 |
chatglm3:6:ggmlv3:q4_0 | ncbi_link | 0.122857 | 45 | 0 | 0.00273017 | 5 |
Full model name | Subtask | Score achieved | Score possible | Score SD | Accuracy | Iterations |
---|---|---|---|---|---|---|
claude-3-5-sonnet-20240620 | significance | 21.6 | 27 | 6.04329e-17 | 0.8 | 3 |
gpt-4o-mini-2024-07-18 | significance | 36 | 45 | 0 | 0.8 | 5 |
gpt-4o-2024-08-06 | significance | 17.8444 | 27 | 0.00285111 | 0.660905 | 3 |
llama-3.1-instruct:70:ggufv2:Q3_K_S | significance | 15.6 | 27 | 6.04329e-17 | 0.577778 | 3 |
claude-3-opus-20240229 | significance | 13.7935 | 27 | 0.013441 | 0.51087 | 3 |
llama-3.1-instruct:70:ggufv2:IQ4_XS | significance | 13.6 | 27 | 6.7987e-17 | 0.503704 | 3 |
gpt-4-turbo-2024-04-09 | significance | 22.6141 | 45 | 0.0472812 | 0.502536 | 5 |
llama-3.1-instruct:70:ggufv2:IQ2_M | significance | 13.0286 | 27 | 6.04329e-17 | 0.48254 | 3 |
llama-3.1-instruct:8:ggufv2:Q6_K | significance | 6.06508 | 27 | 1.51082e-17 | 0.224633 | 3 |
llama-3.1-instruct:8:ggufv2:IQ4_XS | significance | 6.00361 | 27 | 7.55411e-18 | 0.222356 | 3 |
llama-3.1-instruct:8:ggufv2:Q5_K_M | significance | 5.21313 | 27 | 1.51082e-17 | 0.193079 | 3 |
llama-3.1-instruct:8:ggufv2:Q8_0 | significance | 5.0798 | 27 | 2.26623e-17 | 0.188141 | 3 |
llama-3.1-instruct:8:ggufv2:Q4_K_M | significance | 4.94646 | 27 | 3.02164e-17 | 0.183202 | 3 |
llama-3.1-instruct:8:ggufv2:Q3_K_L | significance | 4.31237 | 27 | 2.36066e-17 | 0.159717 | 3 |
gpt-4-0613 | significance | 5.6 | 45 | 0 | 0.124444 | 5 |
gpt-4-0125-preview | significance | 5.18384 | 45 | 0 | 0.115196 | 5 |
gpt-4o-2024-05-13 | significance | 4.22424 | 45 | 0 | 0.0938721 | 5 |
openhermes-2.5:7:ggufv2:Q4_K_M | significance | 3.92996 | 45 | 0 | 0.0873325 | 5 |
openhermes-2.5:7:ggufv2:Q8_0 | significance | 3.78182 | 45 | 0 | 0.0840404 | 5 |
openhermes-2.5:7:ggufv2:Q6_K | significance | 3.78182 | 45 | 0 | 0.0840404 | 5 |
openhermes-2.5:7:ggufv2:Q5_K_M | significance | 3.77787 | 45 | 0 | 0.0839526 | 5 |
openhermes-2.5:7:ggufv2:Q3_K_M | significance | 3.69091 | 45 | 0 | 0.0820202 | 5 |
gpt-3.5-turbo-0613 | significance | 3.58562 | 45 | 0 | 0.0796804 | 5 |
gpt-3.5-turbo-0125 | significance | 3.51717 | 45 | 0 | 0.0781594 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q4_K_M | significance | 2.93833 | 45 | 0 | 0.0652963 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q6_K | significance | 2.87928 | 45 | 0 | 0.063984 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q3_K_M | significance | 2.79423 | 45 | 0 | 0.062094 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q8_0 | significance | 2.62296 | 45 | 0 | 0.0582881 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q5_K_M | significance | 2.56724 | 45 | 0 | 0.0570498 | 5 |
openhermes-2.5:7:ggufv2:Q2_K | significance | 2.48514 | 45 | 0 | 0.0552254 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q2_K | significance | 2.4813 | 45 | 0 | 0.05514 | 5 |
llama-2-chat:7:ggufv2:Q8_0 | significance | 1.10159 | 27 | 0 | 0.0407996 | 3 |
llama-2-chat:13:ggufv2:Q6_K | significance | 1.07015 | 27 | 1.0623e-18 | 0.0396352 | 3 |
mixtral-instruct-v0.1:46_7:ggufv2:Q8_0 | significance | 1.50696 | 45 | 0 | 0.033488 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q6_K | significance | 1.34869 | 45 | 0 | 0.0299709 | 5 |
llama-2-chat:7:ggufv2:Q6_K | significance | 0.806474 | 27 | 0 | 0.0298694 | 3 |
mixtral-instruct-v0.1:46_7:ggufv2:Q5_K_M | significance | 1.31454 | 45 | 0 | 0.0292119 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q3_K_M | significance | 1.2312 | 45 | 0 | 0.0273599 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q4_K_M | significance | 1.01129 | 45 | 0 | 0.0224731 | 5 |
llama-3-instruct:8:ggufv2:Q6_K | significance | 0.994971 | 45 | 0 | 0.0221105 | 5 |
llama-3-instruct:8:ggufv2:Q8_0 | significance | 0.957259 | 45 | 0 | 0.0212724 | 5 |
llama-2-chat:70:ggufv2:Q3_K_M | significance | 0.758379 | 45 | 0 | 0.0168529 | 5 |
llama-2-chat:70:ggufv2:Q2_K | significance | 0.716547 | 45 | 0 | 0.0159233 | 5 |
llama-2-chat:70:ggufv2:Q4_K_M | significance | 0.68386 | 45 | 0 | 0.0151969 | 5 |
llama-3-instruct:8:ggufv2:Q5_K_M | significance | 0.636128 | 45 | 0 | 0.0141362 | 5 |
llama-2-chat:70:ggufv2:Q5_K_M | significance | 0.518572 | 45 | 0 | 0.0115238 | 5 |
llama-2-chat:7:ggufv2:Q4_K_M | significance | 0.329457 | 45 | 0 | 0.00732127 | 5 |
llama-2-chat:13:ggufv2:Q8_0 | significance | 0.326026 | 45 | 0 | 0.00724502 | 5 |
llama-2-chat:7:ggufv2:Q5_K_M | significance | 0.281188 | 45 | 0 | 0.00624862 | 5 |
llama-3-instruct:8:ggufv2:Q4_K_M | significance | 0.228461 | 45 | 0 | 0.00507691 | 5 |
llama-2-chat:13:ggufv2:Q4_K_M | significance | 0.213246 | 45 | 0 | 0.0047388 | 5 |
llama-2-chat:13:ggufv2:Q2_K | significance | 0.207957 | 45 | 0 | 0.00462127 | 5 |
llama-2-chat:13:ggufv2:Q5_K_M | significance | 0.205271 | 45 | 0 | 0.00456158 | 5 |
llama-2-chat:7:ggufv2:Q3_K_M | significance | 0.194946 | 45 | 0 | 0.00433214 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q2_K | significance | 0.178078 | 45 | 0 | 0.00395728 | 5 |
llama-2-chat:13:ggufv2:Q3_K_M | significance | 0.131484 | 45 | 0 | 0.00292186 | 5 |
code-llama-instruct:7:ggufv2:Q4_K_M | significance | 0.123914 | 45 | 0 | 0.00275365 | 5 |
chatglm3:6:ggmlv3:q4_0 | significance | 0.118153 | 45 | 0 | 0.00262562 | 5 |
llama-2-chat:7:ggufv2:Q2_K | significance | 0.103278 | 45 | 0 | 0.00229507 | 5 |
Full model name | Subtask | Score achieved | Score possible | Score SD | Accuracy | Iterations |
---|---|---|---|---|---|---|
claude-3-5-sonnet-20240620 | stats | 27 | 27 | 0 | 1 | 3 |
claude-3-opus-20240229 | stats | 27 | 27 | 0 | 1 | 3 |
gpt-4-turbo-2024-04-09 | stats | 45 | 45 | 0 | 1 | 5 |
gpt-4o-2024-08-06 | stats | 26.5385 | 27 | 0 | 0.982906 | 3 |
gpt-4o-mini-2024-07-18 | stats | 42.5641 | 45 | 0 | 0.945869 | 5 |
llama-3.1-instruct:70:ggufv2:IQ4_XS | stats | 24 | 27 | 0 | 0.888889 | 3 |
llama-3.1-instruct:8:ggufv2:Q8_0 | stats | 18.6622 | 27 | 0 | 0.691194 | 3 |
llama-3.1-instruct:8:ggufv2:Q6_K | stats | 18.6622 | 27 | 0 | 0.691194 | 3 |
llama-3.1-instruct:8:ggufv2:IQ4_XS | stats | 18.3112 | 27 | 1.88853e-18 | 0.678192 | 3 |
llama-3.1-instruct:8:ggufv2:Q5_K_M | stats | 18.3058 | 27 | 0 | 0.677993 | 3 |
llama-3.1-instruct:70:ggufv2:Q3_K_S | stats | 18 | 27 | 0 | 0.666667 | 3 |
llama-3.1-instruct:70:ggufv2:IQ2_M | stats | 18 | 27 | 0 | 0.666667 | 3 |
llama-3.1-instruct:8:ggufv2:Q4_K_M | stats | 17.2171 | 27 | 0.033358 | 0.637672 | 3 |
llama-3.1-instruct:8:ggufv2:Q3_K_L | stats | 15.7067 | 27 | 0 | 0.581731 | 3 |
gpt-4-0125-preview | stats | 8.86667 | 45 | 0 | 0.197037 | 5 |
openhermes-2.5:7:ggufv2:Q8_0 | stats | 8.66667 | 45 | 0 | 0.192593 | 5 |
openhermes-2.5:7:ggufv2:Q6_K | stats | 8.66667 | 45 | 0 | 0.192593 | 5 |
openhermes-2.5:7:ggufv2:Q5_K_M | stats | 8.52821 | 45 | 0 | 0.189516 | 5 |
gpt-4-0613 | stats | 8.51282 | 45 | 0 | 0.189174 | 5 |
gpt-4o-2024-05-13 | stats | 8.51282 | 45 | 0 | 0.189174 | 5 |
openhermes-2.5:7:ggufv2:Q4_K_M | stats | 8.25641 | 45 | 0 | 0.183476 | 5 |
openhermes-2.5:7:ggufv2:Q3_K_M | stats | 8.05641 | 45 | 0 | 0.179031 | 5 |
openhermes-2.5:7:ggufv2:Q2_K | stats | 8 | 45 | 0 | 0.177778 | 5 |
gpt-3.5-turbo-0613 | stats | 7.98135 | 45 | 0 | 0.177363 | 5 |
gpt-3.5-turbo-0125 | stats | 7.12976 | 45 | 0 | 0.158439 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q5_K_M | stats | 6.89091 | 45 | 0 | 0.153131 | 5 |
llama-2-chat:13:ggufv2:Q6_K | stats | 4.05299 | 27 | 2.26623e-17 | 0.150111 | 3 |
llama-2-chat:7:ggufv2:Q8_0 | stats | 3.87128 | 27 | 7.55411e-18 | 0.143381 | 3 |
mistral-instruct-v0.2:7:ggufv2:Q4_K_M | stats | 6.29908 | 45 | 0 | 0.13998 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q6_K | stats | 6.18322 | 45 | 0 | 0.137405 | 5 |
llama-3-instruct:8:ggufv2:Q8_0 | stats | 5.17406 | 45 | 0 | 0.114979 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q3_K_M | stats | 5.1041 | 45 | 0 | 0.113424 | 5 |
mistral-instruct-v0.2:7:ggufv2:Q8_0 | stats | 5.04591 | 45 | 0 | 0.112131 | 5 |
llama-2-chat:7:ggufv2:Q6_K | stats | 2.99816 | 27 | 7.55411e-18 | 0.111043 | 3 |
mistral-instruct-v0.2:7:ggufv2:Q2_K | stats | 4.63496 | 45 | 0 | 0.102999 | 5 |
llama-3-instruct:8:ggufv2:Q6_K | stats | 4.30739 | 45 | 0 | 0.0957198 | 5 |
llama-3-instruct:8:ggufv2:Q5_K_M | stats | 3.9346 | 45 | 0 | 0.0874356 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q8_0 | stats | 3.60737 | 45 | 0 | 0.0801638 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q5_K_M | stats | 3.58841 | 45 | 0 | 0.0797425 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q6_K | stats | 3.21213 | 45 | 0 | 0.0713807 | 5 |
llama-2-chat:70:ggufv2:Q4_K_M | stats | 3.08109 | 45 | 0 | 0.0684688 | 5 |
llama-3-instruct:8:ggufv2:Q4_K_M | stats | 2.98843 | 45 | 0 | 0.0664096 | 5 |
llama-2-chat:70:ggufv2:Q2_K | stats | 2.65216 | 45 | 0 | 0.0589368 | 5 |
llama-2-chat:70:ggufv2:Q3_K_M | stats | 2.44276 | 45 | 0 | 0.0542835 | 5 |
llama-2-chat:70:ggufv2:Q5_K_M | stats | 2.3993 | 45 | 0 | 0.0533177 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q4_K_M | stats | 2.21549 | 45 | 0 | 0.049233 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q3_K_M | stats | 1.96241 | 45 | 0 | 0.0436091 | 5 |
mixtral-instruct-v0.1:46_7:ggufv2:Q2_K | stats | 1.76057 | 45 | 0 | 0.0391237 | 5 |
llama-2-chat:7:ggufv2:Q4_K_M | stats | 1.43589 | 45 | 0 | 0.0319086 | 5 |
llama-2-chat:13:ggufv2:Q4_K_M | stats | 1.41695 | 45 | 0 | 0.0314878 | 5 |
llama-2-chat:13:ggufv2:Q8_0 | stats | 1.38608 | 45 | 0 | 0.0308019 | 5 |
llama-2-chat:7:ggufv2:Q5_K_M | stats | 1.3859 | 45 | 0 | 0.0307977 | 5 |
llama-2-chat:13:ggufv2:Q3_K_M | stats | 1.35834 | 45 | 0 | 0.0301854 | 5 |
llama-2-chat:13:ggufv2:Q5_K_M | stats | 1.3371 | 45 | 0 | 0.0297134 | 5 |
llama-2-chat:13:ggufv2:Q2_K | stats | 1.12439 | 45 | 0 | 0.0249865 | 5 |
code-llama-instruct:7:ggufv2:Q4_K_M | stats | 0.860471 | 45 | 0 | 0.0191216 | 5 |
llama-2-chat:7:ggufv2:Q3_K_M | stats | 0.804538 | 45 | 0 | 0.0178786 | 5 |
llama-2-chat:7:ggufv2:Q2_K | stats | 0.558031 | 45 | 0 | 0.0124007 | 5 |
chatglm3:6:ggmlv3:q4_0 | stats | 0.17925 | 45 | 0 | 0.00398332 | 5 |
Full model name | Score achieved | Score possible | Score SD | Accuracy | Iterations |
---|---|---|---|---|---|
gpt-4-turbo-2024-04-09 | 99 | 100 | 0.447214 | 0.99 | 5 |
gpt-4o-mini-2024-07-18 | 98 | 100 | 0.547723 | 0.98 | 5 |
gpt-4o-2024-05-13 | 96 | 100 | 0.83666 | 0.96 | 5 |
Medical Exam Question Answering
In this set of tasks, we test LLM abilities to answer medical exam questions.
Full model name | Score achieved | Score possible | Score SD | Accuracy | Iterations |
---|---|---|---|---|---|
gpt-4o-2024-08-06 | 806 | 948 | 2.73205 | 0.850211 | 3 |
gpt-4o-mini-2024-07-18 | 1352 | 1608 | 1.44215 | 0.840796 | 5 |
gpt-4-turbo-2024-04-09 | 1366 | 1632 | 3.231 | 0.83701 | 5 |
llama-3.1-instruct:70:ggufv2:IQ4_XS | 765 | 930 | 0 | 0.822581 | 3 |
claude-3-opus-20240229 | 754 | 936 | 2.88675 | 0.805556 | 3 |
llama-3.1-instruct:70:ggufv2:Q3_K_S | 732 | 915 | 0 | 0.8 | 3 |
gpt-4-0125-preview | 837 | 1077 | 4.04145 | 0.777159 | 3 |
claude-3-5-sonnet-20240620 | 759 | 981 | 0 | 0.7737 | 3 |
llama-3.1-instruct:70:ggufv2:IQ2_M | 684 | 885 | 0 | 0.772881 | 3 |
llama-3.1-instruct:8:ggufv2:Q3_K_L | 657 | 855 | 0 | 0.768421 | 3 |
llama-3.1-instruct:8:ggufv2:Q8_0 | 657 | 858 | 0 | 0.765734 | 3 |
gpt-4o-2024-05-13 | 820 | 1074 | 4.04145 | 0.763501 | 3 |
llama-3.1-instruct:8:ggufv2:IQ4_XS | 654 | 864 | 0 | 0.756944 | 3 |
llama-3.1-instruct:8:ggufv2:Q6_K | 645 | 858 | 0 | 0.751748 | 3 |
llama-3.1-instruct:8:ggufv2:Q5_K_M | 636 | 849 | 0 | 0.749117 | 3 |
llama-3.1-instruct:8:ggufv2:Q4_K_M | 621 | 837 | 0 | 0.741935 | 3 |
gpt-4-0613 | 785 | 1074 | 3.4641 | 0.730912 | 3 |
gpt-3.5-turbo-0125 | 721 | 1074 | 7.50555 | 0.671322 | 3 |
llama-3-instruct:8:ggufv2:Q8_0 | 690 | 1077 | 0 | 0.640669 | 3 |
llama-3-instruct:8:ggufv2:Q5_K_M | 684 | 1077 | 0 | 0.635097 | 3 |
llama-3-instruct:8:ggufv2:Q4_K_M | 673 | 1077 | 0.57735 | 0.624884 | 3 |
llama-3-instruct:8:ggufv2:Q6_K | 672 | 1077 | 0 | 0.623955 | 3 |
openhermes-2.5:7:ggufv2:Q4_K_M | 628 | 1071 | 1.73205 | 0.586368 | 3 |
openhermes-2.5:7:ggufv2:Q8_0 | 618 | 1071 | 0 | 0.577031 | 3 |
openhermes-2.5:7:ggufv2:Q6_K | 615 | 1071 | 0 | 0.57423 | 3 |
openhermes-2.5:7:ggufv2:Q5_K_M | 612 | 1071 | 0 | 0.571429 | 3 |
openhermes-2.5:7:ggufv2:Q3_K_M | 604 | 1071 | 1.73205 | 0.563959 | 3 |
openhermes-2.5:7:ggufv2:Q2_K | 579 | 1074 | 0 | 0.539106 | 3 |
llama-2-chat:13:ggufv2:Q8_0 | 462 | 1071 | 0 | 0.431373 | 3 |
llama-2-chat:13:ggufv2:Q5_K_M | 462 | 1071 | 0 | 0.431373 | 3 |
llama-2-chat:13:ggufv2:Q4_K_M | 459 | 1071 | 0 | 0.428571 | 3 |
llama-2-chat:13:ggufv2:Q6_K | 459 | 1071 | 0 | 0.428571 | 3 |
llama-2-chat:13:ggufv2:Q3_K_M | 459 | 1071 | 0 | 0.428571 | 3 |
chatglm3:6:ggmlv3:q4_0 | 457 | 1071 | 21.6162 | 0.426704 | 3 |
llama-2-chat:13:ggufv2:Q2_K | 444 | 1071 | 0 | 0.414566 | 3 |
llama-2-chat:7:ggufv2:Q6_K | 435 | 1071 | 0 | 0.406162 | 3 |
llama-2-chat:7:ggufv2:Q8_0 | 429 | 1071 | 0 | 0.40056 | 3 |
llama-2-chat:7:ggufv2:Q5_K_M | 429 | 1071 | 0 | 0.40056 | 3 |
llama-2-chat:7:ggufv2:Q4_K_M | 429 | 1071 | 0 | 0.40056 | 3 |
llama-2-chat:7:ggufv2:Q3_K_M | 423 | 1071 | 0 | 0.394958 | 3 |
llama-2-chat:7:ggufv2:Q2_K | 396 | 1071 | 0 | 0.369748 | 3 |
mixtral-instruct-v0.1:46_7:ggufv2:Q4_K_M | 395 | 1071 | 0.57735 | 0.368814 | 3 |
mistral-instruct-v0.2:7:ggufv2:Q8_0 | 393 | 1071 | 0 | 0.366947 | 3 |
mistral-instruct-v0.2:7:ggufv2:Q6_K | 393 | 1071 | 0 | 0.366947 | 3 |
mistral-instruct-v0.2:7:ggufv2:Q4_K_M | 391 | 1071 | 0.57735 | 0.365079 | 3 |
mistral-instruct-v0.2:7:ggufv2:Q5_K_M | 390 | 1071 | 0 | 0.364146 | 3 |
mistral-instruct-v0.2:7:ggufv2:Q3_K_M | 386 | 1071 | 0.57735 | 0.360411 | 3 |
mixtral-instruct-v0.1:46_7:ggufv2:Q8_0 | 384 | 1071 | 0 | 0.358543 | 3 |
mistral-instruct-v0.2:7:ggufv2:Q2_K | 378 | 1071 | 0 | 0.352941 | 3 |
mixtral-instruct-v0.1:46_7:ggufv2:Q5_K_M | 378 | 1071 | 0 | 0.352941 | 3 |
mixtral-instruct-v0.1:46_7:ggufv2:Q6_K | 367 | 1071 | 0.57735 | 0.34267 | 3 |
mixtral-instruct-v0.1:46_7:ggufv2:Q2_K | 353 | 1071 | 0.57735 | 0.329599 | 3 |