Skip to content

Benchmark - All Results

BioCypher query generation

In this set of tasks, we test LLM abilities to generate queries for a BioCypher Knowledge Graph using BioChatter. The schema_config.yaml of the BioCypher Knowledge Graph and a natural language query are passed to BioChatter.

Individual steps of the query generation process are tested separately, as well as the end-to-end performance of the process.

Full model name Score achieved Score possible Score SD Accuracy Iterations
gpt-4o-2024-05-13 40 40 0 1 5
gpt-3.5-turbo-0125 40 40 0 1 5
openhermes-2.5:7:ggufv2:Q6_K 40 40 0 1 5
openhermes-2.5:7:ggufv2:Q3_K_M 45 45 0 1 5
gpt-3.5-turbo-0613 40 45 0 0.888889 5
gpt-4-0613 40 45 0 0.888889 5
openhermes-2.5:7:ggufv2:Q5_K_M 40 45 0 0.888889 5
openhermes-2.5:7:ggufv2:Q4_K_M 40 45 0 0.888889 5
openhermes-2.5:7:ggufv2:Q8_0 40 45 0 0.888889 5
llama-3-instruct:8:ggufv2:Q6_K 35 40 0 0.875 5
llama-3-instruct:8:ggufv2:Q8_0 35 40 0 0.875 5
llama-3-instruct:8:ggufv2:Q5_K_M 35 40 0 0.875 5
llama-3-instruct:8:ggufv2:Q4_K_M 31 36 0 0.861111 5
gpt-4-0125-preview 35 45 0 0.777778 5
chatglm3:6:ggmlv3:q4_0 30 40 0 0.75 5
openhermes-2.5:7:ggufv2:Q2_K 25 45 0 0.555556 5
code-llama-instruct:7:ggufv2:Q3_K_M 20 40 0 0.5 5
mistral-instruct-v0.2:7:ggufv2:Q6_K 20 40 0 0.5 5
mixtral-instruct-v0.1:46_7:ggufv2:Q6_K 19 40 0 0.475 5
code-llama-instruct:13:ggufv2:Q3_K_M 18 40 0 0.45 5
llama-2-chat:70:ggufv2:Q4_K_M 20 45 0 0.444444 5
llama-2-chat:70:ggufv2:Q5_K_M 20 45 0 0.444444 5
llama-2-chat:7:ggufv2:Q5_K_M 20 45 0 0.444444 5
llama-2-chat:7:ggufv2:Q4_K_M 20 45 0 0.444444 5
llama-2-chat:7:ggufv2:Q8_0 20 45 0 0.444444 5
mistral-instruct-v0.2:7:ggufv2:Q5_K_M 20 45 0 0.444444 5
mixtral-instruct-v0.1:46_7:ggufv2:Q5_K_M 19 45 0 0.422222 5
llama-2-chat:7:ggufv2:Q6_K 15 40 0 0.375 5
code-llama-instruct:7:ggufv2:Q4_K_M 15 45 0 0.333333 5
llama-2-chat:7:ggufv2:Q3_K_M 15 45 0 0.333333 5
mixtral-instruct-v0.1:46_7:ggufv2:Q3_K_M 15 45 0 0.333333 5
mixtral-instruct-v0.1:46_7:ggufv2:Q4_K_M 15 45 0 0.333333 5
mistral-instruct-v0.2:7:ggufv2:Q8_0 15 45 0 0.333333 5
mistral-instruct-v0.2:7:ggufv2:Q3_K_M 15 45 0 0.333333 5
mistral-instruct-v0.2:7:ggufv2:Q4_K_M 15 45 0 0.333333 5
llama-2-chat:70:ggufv2:Q3_K_M 15 45 0 0.333333 5
mixtral-instruct-v0.1:46_7:ggufv2:Q8_0 14 45 0 0.311111 5
code-llama-instruct:7:ggufv2:Q2_K 10 40 0 0.25 5
code-llama-instruct:34:ggufv2:Q8_0 10 40 0 0.25 5
mistral-instruct-v0.2:7:ggufv2:Q2_K 10 45 0 0.222222 5
code-llama-instruct:34:ggufv2:Q6_K 5 40 0 0.125 5
code-llama-instruct:34:ggufv2:Q5_K_M 5 40 0 0.125 5
code-llama-instruct:7:ggufv2:Q5_K_M 5 45 0 0.111111 5
code-llama-instruct:34:ggufv2:Q3_K_M 0 40 0 0 5
code-llama-instruct:34:ggufv2:Q2_K 0 40 0 0 5
code-llama-instruct:13:ggufv2:Q8_0 0 40 0 0 5
llama-2-chat:13:ggufv2:Q8_0 0 45 0 0 5
llama-2-chat:13:ggufv2:Q6_K 0 40 0 0 5
llama-2-chat:13:ggufv2:Q5_K_M 0 45 0 0 5
llama-2-chat:13:ggufv2:Q4_K_M 0 45 0 0 5
llama-2-chat:13:ggufv2:Q3_K_M 0 45 0 0 5
llama-2-chat:13:ggufv2:Q2_K 0 45 0 0 5
code-llama-instruct:7:ggufv2:Q6_K 0 40 0 0 5
code-llama-instruct:7:ggufv2:Q8_0 0 45 0 0 5
code-llama-instruct:13:ggufv2:Q6_K 0 40 0 0 5
code-llama-instruct:13:ggufv2:Q5_K_M 0 40 0 0 5
code-llama-instruct:13:ggufv2:Q4_K_M 0 40 0 0 5
code-llama-instruct:13:ggufv2:Q2_K 0 40 0 0 5
code-llama-instruct:34:ggufv2:Q4_K_M 0 40 0 0 5
llama-2-chat:7:ggufv2:Q2_K 0 45 0 0 5
llama-2-chat:70:ggufv2:Q2_K 0 45 0 0 5
mixtral-instruct-v0.1:46_7:ggufv2:Q2_K 0 45 0 0 5
Full model name Score achieved Score possible Score SD Accuracy Iterations
gpt-3.5-turbo-0125 60 60 0 1 5
openhermes-2.5:7:ggufv2:Q4_K_M 60 60 0 1 5
openhermes-2.5:7:ggufv2:Q3_K_M 60 60 0 1 5
openhermes-2.5:7:ggufv2:Q5_K_M 60 60 0 1 5
openhermes-2.5:7:ggufv2:Q6_K 60 60 0 1 5
openhermes-2.5:7:ggufv2:Q8_0 60 60 0 1 5
gpt-4-0125-preview 45 60 0 0.75 5
gpt-4-0613 39 60 0 0.65 5
openhermes-2.5:7:ggufv2:Q2_K 30 60 0 0.5 5
gpt-3.5-turbo-0613 30 60 0 0.5 5
code-llama-instruct:34:ggufv2:Q2_K 30 60 0 0.5 5
chatglm3:6:ggmlv3:q4_0 24 60 0 0.4 5
code-llama-instruct:34:ggufv2:Q3_K_M 15 60 0 0.25 5
code-llama-instruct:7:ggufv2:Q2_K 15 60 0 0.25 5
llama-2-chat:70:ggufv2:Q4_K_M 15 60 0 0.25 5
llama-2-chat:70:ggufv2:Q5_K_M 15 60 0 0.25 5
code-llama-instruct:7:ggufv2:Q3_K_M 15 60 0 0.25 5
mixtral-instruct-v0.1:46_7:ggufv2:Q3_K_M 15 60 0 0.25 5
mixtral-instruct-v0.1:46_7:ggufv2:Q6_K 15 60 0 0.25 5
mixtral-instruct-v0.1:46_7:ggufv2:Q8_0 15 60 0 0.25 5
mixtral-instruct-v0.1:46_7:ggufv2:Q5_K_M 15 60 0 0.25 5
code-llama-instruct:34:ggufv2:Q4_K_M 0 60 0 0 5
code-llama-instruct:34:ggufv2:Q8_0 0 60 0 0 5
code-llama-instruct:34:ggufv2:Q6_K 0 60 0 0 5
code-llama-instruct:34:ggufv2:Q5_K_M 0 60 0 0 5
code-llama-instruct:13:ggufv2:Q4_K_M 0 60 0 0 5
code-llama-instruct:13:ggufv2:Q6_K 0 60 0 0 5
code-llama-instruct:13:ggufv2:Q5_K_M 0 60 0 0 5
code-llama-instruct:13:ggufv2:Q3_K_M 0 60 0 0 5
code-llama-instruct:13:ggufv2:Q2_K 0 60 0 0 5
llama-2-chat:13:ggufv2:Q8_0 0 60 0 0 5
llama-2-chat:13:ggufv2:Q6_K 0 60 0 0 5
llama-2-chat:13:ggufv2:Q5_K_M 0 60 0 0 5
llama-2-chat:13:ggufv2:Q4_K_M 0 60 0 0 5
llama-2-chat:13:ggufv2:Q3_K_M 0 60 0 0 5
llama-2-chat:13:ggufv2:Q2_K 0 60 0 0 5
gpt-4o-2024-05-13 0 60 0 0 5
code-llama-instruct:7:ggufv2:Q8_0 0 60 0 0 5
code-llama-instruct:7:ggufv2:Q5_K_M 0 60 0 0 5
code-llama-instruct:7:ggufv2:Q4_K_M 0 60 0 0 5
code-llama-instruct:7:ggufv2:Q6_K 0 60 0 0 5
code-llama-instruct:13:ggufv2:Q8_0 0 60 0 0 5
llama-2-chat:7:ggufv2:Q2_K 0 60 0 0 5
llama-2-chat:7:ggufv2:Q3_K_M 0 60 0 0 5
llama-2-chat:70:ggufv2:Q3_K_M 0 60 0 0 5
llama-2-chat:70:ggufv2:Q2_K 0 60 0 0 5
mistral-instruct-v0.2:7:ggufv2:Q3_K_M 0 60 0 0 5
mistral-instruct-v0.2:7:ggufv2:Q2_K 0 60 0 0 5
llama-3-instruct:8:ggufv2:Q8_0 0 60 0 0 5
llama-3-instruct:8:ggufv2:Q6_K 0 60 0 0 5
llama-3-instruct:8:ggufv2:Q5_K_M 0 60 0 0 5
llama-3-instruct:8:ggufv2:Q4_K_M 0 60 0 0 5
llama-2-chat:7:ggufv2:Q8_0 0 60 0 0 5
llama-2-chat:7:ggufv2:Q6_K 0 60 0 0 5
llama-2-chat:7:ggufv2:Q5_K_M 0 60 0 0 5
llama-2-chat:7:ggufv2:Q4_K_M 0 60 0 0 5
mixtral-instruct-v0.1:46_7:ggufv2:Q4_K_M 0 60 0 0 5
mixtral-instruct-v0.1:46_7:ggufv2:Q2_K 0 60 0 0 5
mistral-instruct-v0.2:7:ggufv2:Q5_K_M 0 60 0 0 5
mistral-instruct-v0.2:7:ggufv2:Q4_K_M 0 60 0 0 5
mistral-instruct-v0.2:7:ggufv2:Q6_K 0 60 0 0 5
mistral-instruct-v0.2:7:ggufv2:Q8_0 0 60 0 0 5
Full model name Score achieved Score possible Score SD Accuracy Iterations
gpt-3.5-turbo-0613 116 320 0 0.3625 5
gpt-4-0613 115 320 0 0.359375 5
gpt-3.5-turbo-0125 114 320 0 0.35625 5
chatglm3:6:ggmlv3:q4_0 92 320 0 0.2875 5
llama-3-instruct:8:ggufv2:Q8_0 90 320 0 0.28125 5
llama-3-instruct:8:ggufv2:Q6_K 90 320 0 0.28125 5
llama-3-instruct:8:ggufv2:Q5_K_M 60 320 0 0.1875 5
llama-2-chat:70:ggufv2:Q3_K_M 55 320 0 0.171875 5
mixtral-instruct-v0.1:46_7:ggufv2:Q4_K_M 52 320 0 0.1625 5
openhermes-2.5:7:ggufv2:Q3_K_M 40 320 0 0.125 5
openhermes-2.5:7:ggufv2:Q5_K_M 40 320 0 0.125 5
openhermes-2.5:7:ggufv2:Q8_0 40 320 0 0.125 5
llama-3-instruct:8:ggufv2:Q4_K_M 35 320 0 0.109375 5
llama-2-chat:7:ggufv2:Q3_K_M 32 320 0 0.1 5
mixtral-instruct-v0.1:46_7:ggufv2:Q3_K_M 21 320 0 0.065625 5
code-llama-instruct:7:ggufv2:Q2_K 20 320 0 0.0625 5
openhermes-2.5:7:ggufv2:Q4_K_M 15 320 0 0.046875 5
openhermes-2.5:7:ggufv2:Q6_K 15 320 0 0.046875 5
mistral-instruct-v0.2:7:ggufv2:Q6_K 15 320 0 0.046875 5
mistral-instruct-v0.2:7:ggufv2:Q3_K_M 15 320 0 0.046875 5
llama-2-chat:7:ggufv2:Q5_K_M 12 320 0 0.0375 5
mistral-instruct-v0.2:7:ggufv2:Q8_0 12 320 0 0.0375 5
code-llama-instruct:13:ggufv2:Q6_K 0 320 0 0 5
code-llama-instruct:13:ggufv2:Q8_0 0 320 0 0 5
code-llama-instruct:34:ggufv2:Q3_K_M 0 320 0 0 5
code-llama-instruct:34:ggufv2:Q2_K 0 320 0 0 5
code-llama-instruct:34:ggufv2:Q4_K_M 0 320 0 0 5
code-llama-instruct:13:ggufv2:Q4_K_M 0 320 0 0 5
code-llama-instruct:13:ggufv2:Q5_K_M 0 320 0 0 5
code-llama-instruct:13:ggufv2:Q3_K_M 0 320 0 0 5
llama-2-chat:13:ggufv2:Q8_0 0 320 0 0 5
llama-2-chat:13:ggufv2:Q6_K 0 320 0 0 5
llama-2-chat:13:ggufv2:Q5_K_M 0 320 0 0 5
llama-2-chat:13:ggufv2:Q4_K_M 0 320 0 0 5
llama-2-chat:13:ggufv2:Q3_K_M 0 320 0 0 5
llama-2-chat:13:ggufv2:Q2_K 0 320 0 0 5
gpt-4o-2024-05-13 0 320 0 0 5
gpt-4-0125-preview 0 320 0 0 5
code-llama-instruct:7:ggufv2:Q4_K_M 0 320 0 0 5
code-llama-instruct:7:ggufv2:Q8_0 0 320 0 0 5
code-llama-instruct:7:ggufv2:Q5_K_M 0 320 0 0 5
code-llama-instruct:7:ggufv2:Q6_K 0 320 0 0 5
code-llama-instruct:7:ggufv2:Q3_K_M 0 320 0 0 5
code-llama-instruct:34:ggufv2:Q5_K_M 0 320 0 0 5
code-llama-instruct:34:ggufv2:Q6_K 0 320 0 0 5
code-llama-instruct:34:ggufv2:Q8_0 0 320 0 0 5
code-llama-instruct:13:ggufv2:Q2_K 0 320 0 0 5
llama-2-chat:7:ggufv2:Q6_K 0 320 0 0 5
llama-2-chat:7:ggufv2:Q8_0 0 320 0 0 5
llama-2-chat:70:ggufv2:Q2_K 0 320 0 0 5
llama-2-chat:70:ggufv2:Q5_K_M 0 320 0 0 5
llama-2-chat:70:ggufv2:Q4_K_M 0 320 0 0 5
llama-2-chat:7:ggufv2:Q2_K 0 320 0 0 5
llama-2-chat:7:ggufv2:Q4_K_M 0 320 0 0 5
mistral-instruct-v0.2:7:ggufv2:Q2_K 0 320 0 0 5
mixtral-instruct-v0.1:46_7:ggufv2:Q2_K 0 320 0 0 5
mistral-instruct-v0.2:7:ggufv2:Q5_K_M 0 320 0 0 5
mistral-instruct-v0.2:7:ggufv2:Q4_K_M 0 320 0 0 5
mixtral-instruct-v0.1:46_7:ggufv2:Q5_K_M 0 320 0 0 5
openhermes-2.5:7:ggufv2:Q2_K 0 320 0 0 5
mixtral-instruct-v0.1:46_7:ggufv2:Q6_K 0 320 0 0 5
mixtral-instruct-v0.1:46_7:ggufv2:Q8_0 0 320 0 0 5
Full model name Score achieved Score possible Score SD Accuracy Iterations
code-llama-instruct:34:ggufv2:Q4_K_M 39 40 0 0.975 5
code-llama-instruct:34:ggufv2:Q5_K_M 38 40 0 0.95 5
code-llama-instruct:34:ggufv2:Q8_0 37 40 0 0.925 5
code-llama-instruct:34:ggufv2:Q6_K 36 40 0 0.9 5
gpt-4-0613 40 45 0 0.888889 5
code-llama-instruct:13:ggufv2:Q2_K 35 40 0 0.875 5
code-llama-instruct:34:ggufv2:Q3_K_M 35 40 0 0.875 5
gpt-3.5-turbo-0125 39 45 0 0.866667 5
code-llama-instruct:13:ggufv2:Q3_K_M 34 40 0 0.85 5
gpt-4o-2024-05-13 34 40 0 0.85 5
mixtral-instruct-v0.1:46_7:ggufv2:Q6_K 34 40 0 0.85 5
openhermes-2.5:7:ggufv2:Q2_K 38 45 0 0.844444 5
code-llama-instruct:13:ggufv2:Q6_K 33 40 0 0.825 5
code-llama-instruct:7:ggufv2:Q3_K_M 36 45 0 0.8 5
code-llama-instruct:7:ggufv2:Q2_K 32 40 0 0.8 5
llama-2-chat:70:ggufv2:Q3_K_M 35 45 0 0.777778 5
llama-2-chat:70:ggufv2:Q5_K_M 35 45 0 0.777778 5
openhermes-2.5:7:ggufv2:Q5_K_M 35 45 0 0.777778 5
mixtral-instruct-v0.1:46_7:ggufv2:Q3_K_M 35 45 0 0.777778 5
llama-2-chat:13:ggufv2:Q4_K_M 35 45 0 0.777778 5
code-llama-instruct:13:ggufv2:Q4_K_M 31 40 0 0.775 5
llama-2-chat:13:ggufv2:Q6_K 31 40 0 0.775 5
llama-3-instruct:8:ggufv2:Q4_K_M 31 40 0 0.775 5
llama-3-instruct:8:ggufv2:Q6_K 31 40 0 0.775 5
code-llama-instruct:7:ggufv2:Q6_K 31 40 0 0.775 5
code-llama-instruct:13:ggufv2:Q5_K_M 31 40 0 0.775 5
gpt-3.5-turbo-0613 34 45 0 0.755556 5
mixtral-instruct-v0.1:46_7:ggufv2:Q4_K_M 34 45 0 0.755556 5
openhermes-2.5:7:ggufv2:Q8_0 34 45 0 0.755556 5
llama-2-chat:70:ggufv2:Q4_K_M 34 45 0 0.755556 5
openhermes-2.5:7:ggufv2:Q4_K_M 34 45 0 0.755556 5
code-llama-instruct:13:ggufv2:Q8_0 30 40 0 0.75 5
code-llama-instruct:34:ggufv2:Q2_K 30 40 0 0.75 5
llama-2-chat:13:ggufv2:Q3_K_M 33 45 0 0.733333 5
mixtral-instruct-v0.1:46_7:ggufv2:Q2_K 33 45 0 0.733333 5
openhermes-2.5:7:ggufv2:Q6_K 33 45 0 0.733333 5
gpt-4-0125-preview 33 45 0 0.733333 5
llama-3-instruct:8:ggufv2:Q8_0 29 40 0 0.725 5
openhermes-2.5:7:ggufv2:Q3_K_M 36 50 0 0.72 5
mixtral-instruct-v0.1:46_7:ggufv2:Q5_K_M 32 45 0 0.711111 5
llama-2-chat:13:ggufv2:Q8_0 32 45 0 0.711111 5
llama-2-chat:7:ggufv2:Q2_K 31 45 0 0.688889 5
mistral-instruct-v0.2:7:ggufv2:Q4_K_M 31 45 0 0.688889 5
code-llama-instruct:7:ggufv2:Q5_K_M 31 45 0 0.688889 5
mistral-instruct-v0.2:7:ggufv2:Q5_K_M 31 45 0 0.688889 5
code-llama-instruct:7:ggufv2:Q8_0 30 45 0 0.666667 5
llama-2-chat:70:ggufv2:Q2_K 30 45 0 0.666667 5
mixtral-instruct-v0.1:46_7:ggufv2:Q8_0 30 45 0 0.666667 5
mistral-instruct-v0.2:7:ggufv2:Q3_K_M 30 45 0 0.666667 5
llama-3-instruct:8:ggufv2:Q5_K_M 26 40 0 0.65 5
mistral-instruct-v0.2:7:ggufv2:Q6_K 26 40 0 0.65 5
mistral-instruct-v0.2:7:ggufv2:Q8_0 29 45 0 0.644444 5
llama-2-chat:13:ggufv2:Q5_K_M 29 45 0 0.644444 5
code-llama-instruct:7:ggufv2:Q4_K_M 27 45 0 0.6 5
mistral-instruct-v0.2:7:ggufv2:Q2_K 27 45 0 0.6 5
llama-2-chat:7:ggufv2:Q4_K_M 22 45 0 0.488889 5
llama-2-chat:7:ggufv2:Q3_K_M 21 45 0 0.466667 5
llama-2-chat:7:ggufv2:Q8_0 16 45 0 0.355556 5
llama-2-chat:7:ggufv2:Q6_K 15 45 0 0.333333 5
llama-2-chat:13:ggufv2:Q2_K 13 45 0 0.288889 5
llama-2-chat:7:ggufv2:Q5_K_M 13 45 0 0.288889 5
chatglm3:6:ggmlv3:q4_0 11 40 0 0.275 5
Full model name Score achieved Score possible Score SD Accuracy Iterations
gpt-4-0613 102 150 0 0.68 5
llama-3-instruct:8:ggufv2:Q6_K 100 150 0 0.666667 5
llama-3-instruct:8:ggufv2:Q8_0 100 150 0 0.666667 5
llama-3-instruct:8:ggufv2:Q4_K_M 100 150 0 0.666667 5
code-llama-instruct:7:ggufv2:Q4_K_M 98 150 0 0.653333 5
code-llama-instruct:34:ggufv2:Q3_K_M 90 150 0 0.6 5
llama-3-instruct:8:ggufv2:Q5_K_M 90 150 0 0.6 5
openhermes-2.5:7:ggufv2:Q5_K_M 88 150 0 0.586667 5
mistral-instruct-v0.2:7:ggufv2:Q2_K 86 150 0 0.573333 5
code-llama-instruct:34:ggufv2:Q2_K 85 150 0 0.566667 5
code-llama-instruct:13:ggufv2:Q5_K_M 85 150 0 0.566667 5
code-llama-instruct:13:ggufv2:Q2_K 85 150 0 0.566667 5
code-llama-instruct:13:ggufv2:Q8_0 85 150 0 0.566667 5
code-llama-instruct:13:ggufv2:Q6_K 81 150 0 0.54 5
code-llama-instruct:7:ggufv2:Q2_K 80 150 0 0.533333 5
openhermes-2.5:7:ggufv2:Q6_K 80 150 0 0.533333 5
code-llama-instruct:13:ggufv2:Q4_K_M 80 150 0 0.533333 5
code-llama-instruct:13:ggufv2:Q3_K_M 80 150 0 0.533333 5
gpt-4o-2024-05-13 80 150 0 0.533333 5
gpt-3.5-turbo-0613 75 150 0 0.5 5
gpt-3.5-turbo-0125 73 150 0 0.486667 5
chatglm3:6:ggmlv3:q4_0 72 150 0 0.48 5
mixtral-instruct-v0.1:46_7:ggufv2:Q2_K 72 150 0 0.48 5
llama-2-chat:13:ggufv2:Q3_K_M 72 150 0 0.48 5
llama-2-chat:13:ggufv2:Q8_0 72 150 0 0.48 5
code-llama-instruct:34:ggufv2:Q6_K 71 150 0 0.473333 5
llama-2-chat:70:ggufv2:Q2_K 71 150 0 0.473333 5
code-llama-instruct:34:ggufv2:Q8_0 70 150 0 0.466667 5
code-llama-instruct:34:ggufv2:Q4_K_M 70 150 0 0.466667 5
code-llama-instruct:34:ggufv2:Q5_K_M 70 150 0 0.466667 5
mistral-instruct-v0.2:7:ggufv2:Q3_K_M 70 150 0 0.466667 5
mistral-instruct-v0.2:7:ggufv2:Q5_K_M 70 150 0 0.466667 5
openhermes-2.5:7:ggufv2:Q3_K_M 70 150 0 0.466667 5
openhermes-2.5:7:ggufv2:Q4_K_M 70 150 0 0.466667 5
openhermes-2.5:7:ggufv2:Q8_0 70 150 0 0.466667 5
gpt-4-0125-preview 66 150 0 0.44 5
mistral-instruct-v0.2:7:ggufv2:Q8_0 65 150 0 0.433333 5
llama-2-chat:13:ggufv2:Q5_K_M 65 150 0 0.433333 5
openhermes-2.5:7:ggufv2:Q2_K 65 150 0 0.433333 5
mistral-instruct-v0.2:7:ggufv2:Q6_K 65 150 0 0.433333 5
code-llama-instruct:7:ggufv2:Q3_K_M 64 150 0 0.426667 5
mixtral-instruct-v0.1:46_7:ggufv2:Q4_K_M 64 150 0 0.426667 5
llama-2-chat:70:ggufv2:Q4_K_M 63 150 0 0.42 5
llama-2-chat:70:ggufv2:Q3_K_M 62 150 0 0.413333 5
code-llama-instruct:7:ggufv2:Q8_0 60 150 0 0.4 5
code-llama-instruct:7:ggufv2:Q5_K_M 60 150 0 0.4 5
mixtral-instruct-v0.1:46_7:ggufv2:Q8_0 58 150 0 0.386667 5
llama-2-chat:13:ggufv2:Q6_K 58 150 0 0.386667 5
mixtral-instruct-v0.1:46_7:ggufv2:Q3_K_M 57 150 0 0.38 5
llama-2-chat:13:ggufv2:Q2_K 55 150 0 0.366667 5
mistral-instruct-v0.2:7:ggufv2:Q4_K_M 55 150 0 0.366667 5
llama-2-chat:13:ggufv2:Q4_K_M 55 150 0 0.366667 5
llama-2-chat:70:ggufv2:Q5_K_M 54 150 0 0.36 5
code-llama-instruct:7:ggufv2:Q6_K 50 150 0 0.333333 5
mixtral-instruct-v0.1:46_7:ggufv2:Q5_K_M 50 150 0 0.333333 5
mixtral-instruct-v0.1:46_7:ggufv2:Q6_K 50 150 0 0.333333 5
llama-2-chat:7:ggufv2:Q5_K_M 44 150 0 0.293333 5
llama-2-chat:7:ggufv2:Q6_K 40 150 0 0.266667 5
llama-2-chat:7:ggufv2:Q8_0 40 150 0 0.266667 5
llama-2-chat:7:ggufv2:Q4_K_M 36 150 0 0.24 5
llama-2-chat:7:ggufv2:Q3_K_M 35 150 0 0.233333 5
llama-2-chat:7:ggufv2:Q2_K 15 150 0 0.1 5
Full model name Score achieved Score possible Score SD Accuracy Iterations
gpt-3.5-turbo-0125 145 150 0 0.966667 5
code-llama-instruct:7:ggufv2:Q4_K_M 145 150 0 0.966667 5
gpt-4-0613 145 150 0 0.966667 5
code-llama-instruct:7:ggufv2:Q6_K 144 150 0 0.96 5
code-llama-instruct:7:ggufv2:Q5_K_M 144 150 0 0.96 5
code-llama-instruct:7:ggufv2:Q8_0 144 150 0 0.96 5
gpt-3.5-turbo-0613 142 150 0 0.946667 5
openhermes-2.5:7:ggufv2:Q3_K_M 141 150 0 0.94 5
openhermes-2.5:7:ggufv2:Q2_K 141 150 0 0.94 5
llama-3-instruct:8:ggufv2:Q6_K 139 150 0 0.926667 5
llama-3-instruct:8:ggufv2:Q5_K_M 139 150 0 0.926667 5
code-llama-instruct:7:ggufv2:Q2_K 138 150 0 0.92 5
llama-3-instruct:8:ggufv2:Q8_0 138 150 0 0.92 5
llama-3-instruct:8:ggufv2:Q4_K_M 138 150 0 0.92 5
llama-2-chat:70:ggufv2:Q4_K_M 138 150 0 0.92 5
openhermes-2.5:7:ggufv2:Q5_K_M 137 150 0 0.913333 5
code-llama-instruct:34:ggufv2:Q4_K_M 136 150 0 0.906667 5
llama-2-chat:70:ggufv2:Q5_K_M 136 150 0 0.906667 5
llama-2-chat:70:ggufv2:Q3_K_M 136 150 0 0.906667 5
llama-2-chat:70:ggufv2:Q2_K 135 150 0 0.9 5
code-llama-instruct:34:ggufv2:Q5_K_M 135 150 0 0.9 5
mixtral-instruct-v0.1:46_7:ggufv2:Q3_K_M 134 150 0 0.893333 5
openhermes-2.5:7:ggufv2:Q8_0 132 150 0 0.88 5
code-llama-instruct:7:ggufv2:Q3_K_M 131 150 0 0.873333 5
openhermes-2.5:7:ggufv2:Q4_K_M 131 150 0 0.873333 5
openhermes-2.5:7:ggufv2:Q6_K 129 150 0 0.86 5
code-llama-instruct:34:ggufv2:Q8_0 129 150 0 0.86 5
code-llama-instruct:34:ggufv2:Q6_K 128 150 0 0.853333 5
mistral-instruct-v0.2:7:ggufv2:Q8_0 127 150 0 0.846667 5
mixtral-instruct-v0.1:46_7:ggufv2:Q8_0 127 150 0 0.846667 5
mixtral-instruct-v0.1:46_7:ggufv2:Q5_K_M 126 150 0 0.84 5
code-llama-instruct:13:ggufv2:Q4_K_M 125 150 0 0.833333 5
mistral-instruct-v0.2:7:ggufv2:Q6_K 125 150 0 0.833333 5
gpt-4-0125-preview 125 150 0 0.833333 5
code-llama-instruct:13:ggufv2:Q3_K_M 125 150 0 0.833333 5
mixtral-instruct-v0.1:46_7:ggufv2:Q6_K 124 150 0 0.826667 5
mistral-instruct-v0.2:7:ggufv2:Q5_K_M 124 150 0 0.826667 5
mistral-instruct-v0.2:7:ggufv2:Q4_K_M 124 150 0 0.826667 5
code-llama-instruct:13:ggufv2:Q2_K 123 150 0 0.82 5
llama-2-chat:13:ggufv2:Q6_K 122 150 0 0.813333 5
gpt-4o-2024-05-13 120 150 0 0.8 5
code-llama-instruct:13:ggufv2:Q6_K 119 150 0 0.793333 5
llama-2-chat:13:ggufv2:Q8_0 118 150 0 0.786667 5
code-llama-instruct:34:ggufv2:Q3_K_M 118 150 0 0.786667 5
code-llama-instruct:13:ggufv2:Q5_K_M 117 150 0 0.78 5
mistral-instruct-v0.2:7:ggufv2:Q3_K_M 116 150 0 0.773333 5
code-llama-instruct:13:ggufv2:Q8_0 115 150 0 0.766667 5
llama-2-chat:13:ggufv2:Q4_K_M 114 150 0 0.76 5
mixtral-instruct-v0.1:46_7:ggufv2:Q4_K_M 114 150 0 0.76 5
llama-2-chat:13:ggufv2:Q5_K_M 112 150 0 0.746667 5
mixtral-instruct-v0.1:46_7:ggufv2:Q2_K 109 150 0 0.726667 5
mistral-instruct-v0.2:7:ggufv2:Q2_K 104 150 0 0.693333 5
llama-2-chat:7:ggufv2:Q3_K_M 104 150 0 0.693333 5
code-llama-instruct:34:ggufv2:Q2_K 103 150 0 0.686667 5
llama-2-chat:7:ggufv2:Q2_K 103 150 0 0.686667 5
llama-2-chat:13:ggufv2:Q3_K_M 102 150 0 0.68 5
llama-2-chat:7:ggufv2:Q6_K 99 150 0 0.66 5
llama-2-chat:7:ggufv2:Q4_K_M 97 150 0 0.646667 5
llama-2-chat:7:ggufv2:Q8_0 96 150 0 0.64 5
llama-2-chat:7:ggufv2:Q5_K_M 95 150 0 0.633333 5
chatglm3:6:ggmlv3:q4_0 83 150 0 0.553333 5
llama-2-chat:13:ggufv2:Q2_K 65 150 0 0.433333 5
Full model name Score achieved Score possible Score SD Accuracy Iterations
gpt-3.5-turbo-0125 139 150 0 0.926667 5
gpt-4-0613 132 150 0 0.88 5
gpt-3.5-turbo-0613 125 150 0 0.833333 5
code-llama-instruct:13:ggufv2:Q3_K_M 0 150 0 0 5
chatglm3:6:ggmlv3:q4_0 0 150 0 0 5
code-llama-instruct:13:ggufv2:Q2_K 0 150 0 0 5
code-llama-instruct:13:ggufv2:Q8_0 0 150 0 0 5
code-llama-instruct:34:ggufv2:Q2_K 0 150 0 0 5
code-llama-instruct:34:ggufv2:Q3_K_M 0 150 0 0 5
code-llama-instruct:34:ggufv2:Q4_K_M 0 150 0 0 5
code-llama-instruct:34:ggufv2:Q5_K_M 0 150 0 0 5
code-llama-instruct:13:ggufv2:Q4_K_M 0 150 0 0 5
code-llama-instruct:13:ggufv2:Q5_K_M 0 150 0 0 5
code-llama-instruct:13:ggufv2:Q6_K 0 150 0 0 5
code-llama-instruct:7:ggufv2:Q2_K 0 150 0 0 5
code-llama-instruct:34:ggufv2:Q8_0 0 150 0 0 5
code-llama-instruct:34:ggufv2:Q6_K 0 150 0 0 5
code-llama-instruct:7:ggufv2:Q3_K_M 0 150 0 0 5
code-llama-instruct:7:ggufv2:Q6_K 0 150 0 0 5
code-llama-instruct:7:ggufv2:Q5_K_M 0 150 0 0 5
code-llama-instruct:7:ggufv2:Q8_0 0 150 0 0 5
code-llama-instruct:7:ggufv2:Q4_K_M 0 150 0 0 5
gpt-4-0125-preview 0 150 0 0 5
gpt-4o-2024-05-13 0 150 0 0 5
llama-2-chat:13:ggufv2:Q2_K 0 150 0 0 5
llama-2-chat:13:ggufv2:Q3_K_M 0 150 0 0 5
llama-2-chat:13:ggufv2:Q4_K_M 0 150 0 0 5
llama-2-chat:13:ggufv2:Q5_K_M 0 150 0 0 5
llama-2-chat:13:ggufv2:Q6_K 0 150 0 0 5
llama-2-chat:13:ggufv2:Q8_0 0 150 0 0 5
llama-2-chat:70:ggufv2:Q2_K 0 150 0 0 5
llama-2-chat:70:ggufv2:Q3_K_M 0 150 0 0 5
llama-2-chat:70:ggufv2:Q4_K_M 0 150 0 0 5
llama-2-chat:70:ggufv2:Q5_K_M 0 150 0 0 5
llama-2-chat:7:ggufv2:Q2_K 0 150 0 0 5
llama-2-chat:7:ggufv2:Q3_K_M 0 150 0 0 5
llama-2-chat:7:ggufv2:Q4_K_M 0 150 0 0 5
llama-2-chat:7:ggufv2:Q5_K_M 0 150 0 0 5
llama-2-chat:7:ggufv2:Q6_K 0 150 0 0 5
llama-2-chat:7:ggufv2:Q8_0 0 150 0 0 5
llama-3-instruct:8:ggufv2:Q4_K_M 0 150 0 0 5
llama-3-instruct:8:ggufv2:Q5_K_M 0 150 0 0 5
llama-3-instruct:8:ggufv2:Q6_K 0 150 0 0 5
llama-3-instruct:8:ggufv2:Q8_0 0 150 0 0 5
mistral-instruct-v0.2:7:ggufv2:Q2_K 0 150 0 0 5
mistral-instruct-v0.2:7:ggufv2:Q3_K_M 0 150 0 0 5
mistral-instruct-v0.2:7:ggufv2:Q4_K_M 0 150 0 0 5
mistral-instruct-v0.2:7:ggufv2:Q5_K_M 0 150 0 0 5
mistral-instruct-v0.2:7:ggufv2:Q6_K 0 150 0 0 5
mistral-instruct-v0.2:7:ggufv2:Q8_0 0 150 0 0 5
mixtral-instruct-v0.1:46_7:ggufv2:Q2_K 0 150 0 0 5
mixtral-instruct-v0.1:46_7:ggufv2:Q3_K_M 0 150 0 0 5
mixtral-instruct-v0.1:46_7:ggufv2:Q4_K_M 0 150 0 0 5
mixtral-instruct-v0.1:46_7:ggufv2:Q5_K_M 0 150 0 0 5
mixtral-instruct-v0.1:46_7:ggufv2:Q6_K 0 150 0 0 5
mixtral-instruct-v0.1:46_7:ggufv2:Q8_0 0 150 0 0 5
openhermes-2.5:7:ggufv2:Q2_K 0 150 0 0 5
openhermes-2.5:7:ggufv2:Q3_K_M 0 150 0 0 5
openhermes-2.5:7:ggufv2:Q4_K_M 0 150 0 0 5
openhermes-2.5:7:ggufv2:Q5_K_M 0 150 0 0 5
openhermes-2.5:7:ggufv2:Q6_K 0 150 0 0 5
openhermes-2.5:7:ggufv2:Q8_0 0 150 0 0 5

Retrieval-Augmented Generation (RAG)

In this set of tasks, we test LLM abilities to generate answers to a given question using a RAG agent, or to judge the relevance of a RAG fragment to a given question. Instructions can be explicit ("is this fragment relevant to the question?") or implicit (just asking the question without instructions and evaluating whether the model responds with 'not enough information given').

Full model name Score achieved Score possible Score SD Accuracy Iterations
llama-2-chat:13:ggufv2:Q2_K 30 30 0 1 5
llama-2-chat:13:ggufv2:Q8_0 30 30 0 1 5
llama-2-chat:13:ggufv2:Q3_K_M 30 30 0 1 5
llama-2-chat:13:ggufv2:Q6_K 30 30 0 1 5
llama-2-chat:13:ggufv2:Q5_K_M 30 30 0 1 5
llama-2-chat:13:ggufv2:Q4_K_M 30 30 0 1 5
gpt-4-0613 30 30 0 1 5
gpt-4o-2024-05-13 30 30 0 1 5
gpt-3.5-turbo-0613 30 30 0 1 5
gpt-4-0125-preview 30 30 0 1 5
code-llama-instruct:7:ggufv2:Q8_0 30 30 0 1 5
gpt-3.5-turbo-0125 30 30 0 1 5
code-llama-instruct:7:ggufv2:Q4_K_M 30 30 0 1 5
openhermes-2.5:7:ggufv2:Q6_K 30 30 0 1 5
mistral-instruct-v0.2:7:ggufv2:Q4_K_M 30 30 0 1 5
mistral-instruct-v0.2:7:ggufv2:Q3_K_M 30 30 0 1 5
llama-3-instruct:8:ggufv2:Q8_0 30 30 0 1 5
mistral-instruct-v0.2:7:ggufv2:Q2_K 30 30 0 1 5
llama-3-instruct:8:ggufv2:Q5_K_M 30 30 0 1 5
llama-3-instruct:8:ggufv2:Q4_K_M 30 30 0 1 5
llama-2-chat:7:ggufv2:Q8_0 30 30 0 1 5
llama-3-instruct:8:ggufv2:Q6_K 30 30 0 1 5
llama-2-chat:7:ggufv2:Q6_K 30 30 0 1 5
llama-2-chat:7:ggufv2:Q5_K_M 30 30 0 1 5
llama-2-chat:7:ggufv2:Q4_K_M 30 30 0 1 5
llama-2-chat:7:ggufv2:Q3_K_M 30 30 0 1 5
llama-2-chat:70:ggufv2:Q5_K_M 30 30 0 1 5
llama-2-chat:70:ggufv2:Q4_K_M 30 30 0 1 5
llama-2-chat:70:ggufv2:Q3_K_M 30 30 0 1 5
llama-2-chat:70:ggufv2:Q2_K 30 30 0 1 5
openhermes-2.5:7:ggufv2:Q8_0 30 30 0 1 5
mistral-instruct-v0.2:7:ggufv2:Q5_K_M 30 30 0 1 5
mistral-instruct-v0.2:7:ggufv2:Q6_K 30 30 0 1 5
mistral-instruct-v0.2:7:ggufv2:Q8_0 30 30 0 1 5
openhermes-2.5:7:ggufv2:Q4_K_M 30 30 0 1 5
openhermes-2.5:7:ggufv2:Q5_K_M 30 30 0 1 5
openhermes-2.5:7:ggufv2:Q2_K 30 30 0 1 5
openhermes-2.5:7:ggufv2:Q3_K_M 30 30 0 1 5
code-llama-instruct:13:ggufv2:Q6_K 25 30 0 0.833333 5
llama-2-chat:7:ggufv2:Q2_K 25 30 0 0.833333 5
code-llama-instruct:7:ggufv2:Q5_K_M 25 30 0 0.833333 5
code-llama-instruct:13:ggufv2:Q8_0 25 30 0 0.833333 5
code-llama-instruct:7:ggufv2:Q3_K_M 25 30 0 0.833333 5
code-llama-instruct:7:ggufv2:Q6_K 25 30 0 0.833333 5
chatglm3:6:ggmlv3:q4_0 22 30 0 0.733333 5
code-llama-instruct:13:ggufv2:Q5_K_M 20 30 0 0.666667 5
code-llama-instruct:34:ggufv2:Q2_K 15 30 0 0.5 5
code-llama-instruct:34:ggufv2:Q3_K_M 15 30 0 0.5 5
code-llama-instruct:34:ggufv2:Q4_K_M 15 30 0 0.5 5
code-llama-instruct:34:ggufv2:Q8_0 10 30 0 0.333333 5
code-llama-instruct:13:ggufv2:Q4_K_M 10 30 0 0.333333 5
code-llama-instruct:7:ggufv2:Q2_K 10 30 0 0.333333 5
code-llama-instruct:34:ggufv2:Q6_K 10 30 0 0.333333 5
code-llama-instruct:34:ggufv2:Q5_K_M 10 30 0 0.333333 5
mixtral-instruct-v0.1:46_7:ggufv2:Q2_K 10 30 0 0.333333 5
mixtral-instruct-v0.1:46_7:ggufv2:Q4_K_M 5 30 0 0.166667 5
mixtral-instruct-v0.1:46_7:ggufv2:Q8_0 4 30 0 0.133333 5
code-llama-instruct:13:ggufv2:Q2_K 1 30 0 0.0333333 5
code-llama-instruct:13:ggufv2:Q3_K_M 0 30 0 0 5
mixtral-instruct-v0.1:46_7:ggufv2:Q3_K_M 0 30 0 0 5
mixtral-instruct-v0.1:46_7:ggufv2:Q5_K_M 0 30 0 0 5
mixtral-instruct-v0.1:46_7:ggufv2:Q6_K 0 30 0 0 5
Full model name Score achieved Score possible Score SD Accuracy Iterations
chatglm3:6:ggmlv3:q4_0 10 10 0 1 5
code-llama-instruct:34:ggufv2:Q2_K 10 10 0 1 5
code-llama-instruct:34:ggufv2:Q5_K_M 10 10 0 1 5
gpt-4-0613 10 10 0 1 5
gpt-3.5-turbo-0613 10 10 0 1 5
code-llama-instruct:7:ggufv2:Q4_K_M 10 10 0 1 5
llama-3-instruct:8:ggufv2:Q5_K_M 10 10 0 1 5
llama-3-instruct:8:ggufv2:Q4_K_M 10 10 0 1 5
llama-3-instruct:8:ggufv2:Q8_0 10 10 0 1 5
llama-3-instruct:8:ggufv2:Q6_K 10 10 0 1 5
mistral-instruct-v0.2:7:ggufv2:Q3_K_M 10 10 0 1 5
llama-2-chat:7:ggufv2:Q3_K_M 10 10 0 1 5
llama-2-chat:7:ggufv2:Q2_K 10 10 0 1 5
openhermes-2.5:7:ggufv2:Q4_K_M 10 10 0 1 5
mistral-instruct-v0.2:7:ggufv2:Q4_K_M 10 10 0 1 5
mistral-instruct-v0.2:7:ggufv2:Q5_K_M 10 10 0 1 5
mistral-instruct-v0.2:7:ggufv2:Q6_K 10 10 0 1 5
mixtral-instruct-v0.1:46_7:ggufv2:Q5_K_M 10 10 0 1 5
mixtral-instruct-v0.1:46_7:ggufv2:Q4_K_M 10 10 0 1 5
openhermes-2.5:7:ggufv2:Q5_K_M 10 10 0 1 5
openhermes-2.5:7:ggufv2:Q6_K 10 10 0 1 5
openhermes-2.5:7:ggufv2:Q8_0 10 10 0 1 5
llama-2-chat:70:ggufv2:Q4_K_M 10 10 0 1 5
gpt-3.5-turbo-0125 9 10 0 0.9 5
code-llama-instruct:34:ggufv2:Q8_0 9 10 0 0.9 5
code-llama-instruct:34:ggufv2:Q6_K 9 10 0 0.9 5
mistral-instruct-v0.2:7:ggufv2:Q8_0 9 10 0 0.9 5
llama-2-chat:70:ggufv2:Q5_K_M 9 10 0 0.9 5
code-llama-instruct:7:ggufv2:Q6_K 9 10 0 0.9 5
code-llama-instruct:7:ggufv2:Q3_K_M 7 10 0 0.7 5
gpt-4o-2024-05-13 7 10 0 0.7 5
code-llama-instruct:7:ggufv2:Q2_K 7 10 0 0.7 5
mixtral-instruct-v0.1:46_7:ggufv2:Q6_K 7 10 0 0.7 5
mixtral-instruct-v0.1:46_7:ggufv2:Q8_0 6 10 0 0.6 5
llama-2-chat:7:ggufv2:Q5_K_M 6 10 0 0.6 5
mixtral-instruct-v0.1:46_7:ggufv2:Q2_K 6 10 0 0.6 5
code-llama-instruct:13:ggufv2:Q4_K_M 5 10 0 0.5 5
llama-2-chat:13:ggufv2:Q4_K_M 5 10 0 0.5 5
code-llama-instruct:7:ggufv2:Q5_K_M 5 10 0 0.5 5
code-llama-instruct:13:ggufv2:Q8_0 5 10 0 0.5 5
code-llama-instruct:13:ggufv2:Q5_K_M 5 10 0 0.5 5
code-llama-instruct:13:ggufv2:Q6_K 5 10 0 0.5 5
code-llama-instruct:34:ggufv2:Q3_K_M 5 10 0 0.5 5
llama-2-chat:13:ggufv2:Q2_K 5 10 0 0.5 5
code-llama-instruct:7:ggufv2:Q8_0 5 10 0 0.5 5
gpt-4-0125-preview 5 10 0 0.5 5
mistral-instruct-v0.2:7:ggufv2:Q2_K 5 10 0 0.5 5
llama-2-chat:7:ggufv2:Q6_K 5 10 0 0.5 5
llama-2-chat:7:ggufv2:Q8_0 5 10 0 0.5 5
llama-2-chat:13:ggufv2:Q8_0 5 10 0 0.5 5
llama-2-chat:13:ggufv2:Q3_K_M 5 10 0 0.5 5
llama-2-chat:13:ggufv2:Q6_K 5 10 0 0.5 5
llama-2-chat:13:ggufv2:Q5_K_M 5 10 0 0.5 5
llama-2-chat:70:ggufv2:Q3_K_M 5 10 0 0.5 5
llama-2-chat:70:ggufv2:Q2_K 5 10 0 0.5 5
mixtral-instruct-v0.1:46_7:ggufv2:Q3_K_M 5 10 0 0.5 5
openhermes-2.5:7:ggufv2:Q3_K_M 5 10 0 0.5 5
llama-2-chat:7:ggufv2:Q4_K_M 5 10 0 0.5 5
openhermes-2.5:7:ggufv2:Q2_K 5 10 0 0.5 5
code-llama-instruct:13:ggufv2:Q2_K 4 10 0 0.4 5
code-llama-instruct:34:ggufv2:Q4_K_M 4 10 0 0.4 5
code-llama-instruct:13:ggufv2:Q3_K_M 0 10 0 0 5

Text Extraction

In this set of tasks, we test LLM abilities to extract text from a given document.

Full model name Score achieved Score possible Score SD Accuracy Iterations
gpt-4-0125-preview 341.404 495 0 0.689705 5
gpt-4-0613 331.107 495 0 0.668903 5
gpt-4o-2024-05-13 323.703 495 0 0.653946 5
openhermes-2.5:7:ggufv2:Q6_K 306.488 495 0 0.619167 5
openhermes-2.5:7:ggufv2:Q8_0 297.41 495 0 0.600829 5
openhermes-2.5:7:ggufv2:Q4_K_M 295.654 495 0 0.597281 5
openhermes-2.5:7:ggufv2:Q5_K_M 287.059 495 0 0.579916 5
gpt-3.5-turbo-0613 284.814 495 0 0.575381 5
openhermes-2.5:7:ggufv2:Q3_K_M 274.471 495 0 0.554488 5
gpt-3.5-turbo-0125 252.466 495 0 0.510032 5
openhermes-2.5:7:ggufv2:Q2_K 219.807 495 0 0.444054 5
mistral-instruct-v0.2:7:ggufv2:Q5_K_M 190.948 495 0 0.385754 5
mistral-instruct-v0.2:7:ggufv2:Q3_K_M 182.642 495 0 0.368974 5
mistral-instruct-v0.2:7:ggufv2:Q6_K 181.869 495 0 0.367412 5
mistral-instruct-v0.2:7:ggufv2:Q8_0 174.084 495 0 0.351684 5
mistral-instruct-v0.2:7:ggufv2:Q4_K_M 171.777 495 0 0.347025 5
mistral-instruct-v0.2:7:ggufv2:Q2_K 163.974 495 0 0.331261 5
llama-2-chat:70:ggufv2:Q4_K_M 119.263 495 0 0.240936 5
mixtral-instruct-v0.1:46_7:ggufv2:Q5_K_M 116.651 495 0 0.235659 5
mixtral-instruct-v0.1:46_7:ggufv2:Q3_K_M 113.663 495 0 0.229622 5
mixtral-instruct-v0.1:46_7:ggufv2:Q6_K 111.634 495 0 0.225524 5
llama-2-chat:70:ggufv2:Q2_K 106.448 495 0 0.215047 5
llama-2-chat:70:ggufv2:Q5_K_M 104.032 495 0 0.210166 5
llama-2-chat:70:ggufv2:Q3_K_M 97.9593 495 0 0.197898 5
mixtral-instruct-v0.1:46_7:ggufv2:Q4_K_M 95.9243 495 0 0.193786 5
mixtral-instruct-v0.1:46_7:ggufv2:Q8_0 93.6428 495 0 0.189177 5
llama-3-instruct:8:ggufv2:Q8_0 93.3345 495 0 0.188555 5
chatglm3:6:ggmlv3:q4_0 93.2008 495 0 0.188284 5
llama-3-instruct:8:ggufv2:Q5_K_M 82.3847 495 0 0.166434 5
llama-3-instruct:8:ggufv2:Q6_K 80.5152 495 0 0.162657 5
mixtral-instruct-v0.1:46_7:ggufv2:Q2_K 77.9693 495 0 0.157514 5
code-llama-instruct:7:ggufv2:Q4_K_M 68.6724 495 0 0.138732 5
llama-3-instruct:8:ggufv2:Q4_K_M 57.8514 495 0 0.116871 5
llama-2-chat:13:ggufv2:Q3_K_M 55.7521 495 0 0.112631 5
llama-2-chat:13:ggufv2:Q4_K_M 43.9894 495 0 0.0888675 5
llama-2-chat:7:ggufv2:Q4_K_M 42.1985 495 0 0.0852494 5
llama-2-chat:7:ggufv2:Q8_0 25.1647 297 1.46597e-16 0.0847297 3
llama-2-chat:13:ggufv2:Q6_K 23.2057 297 0.00246731 0.0781337 3
llama-2-chat:13:ggufv2:Q5_K_M 37.9252 495 0 0.0766167 5
llama-2-chat:13:ggufv2:Q8_0 37.7416 495 0 0.0762457 5
llama-2-chat:7:ggufv2:Q5_K_M 34.5308 495 0 0.0697591 5
llama-2-chat:7:ggufv2:Q3_K_M 32.2105 495 0 0.0650717 5
llama-2-chat:13:ggufv2:Q2_K 32.1447 495 0 0.0649389 5
llama-2-chat:7:ggufv2:Q6_K 18.2539 297 2.57076e-16 0.0614608 3
llama-2-chat:7:ggufv2:Q2_K 17.9123 495 0 0.0361865 5
Full model name Subtask Score achieved Score possible Score SD Accuracy Iterations
gpt-4o-2024-05-13 assay 6.67307 45 0 0.148291 5
gpt-4-0125-preview assay 6.60264 45 0 0.146725 5
openhermes-2.5:7:ggufv2:Q6_K assay 6.45354 45 0 0.143412 5
mistral-instruct-v0.2:7:ggufv2:Q3_K_M assay 6.42156 45 0 0.142701 5
openhermes-2.5:7:ggufv2:Q8_0 assay 6.24141 45 0 0.138698 5
mistral-instruct-v0.2:7:ggufv2:Q8_0 assay 5.8662 45 0 0.13036 5
mistral-instruct-v0.2:7:ggufv2:Q2_K assay 5.84165 45 0 0.129814 5
mistral-instruct-v0.2:7:ggufv2:Q6_K assay 5.83272 45 0 0.129616 5
openhermes-2.5:7:ggufv2:Q5_K_M assay 5.77475 45 0 0.128328 5
mistral-instruct-v0.2:7:ggufv2:Q4_K_M assay 5.72421 45 0 0.127205 5
gpt-3.5-turbo-0613 assay 5.71717 45 0 0.127048 5
mistral-instruct-v0.2:7:ggufv2:Q5_K_M assay 5.66084 45 0 0.125797 5
gpt-3.5-turbo-0125 assay 5.48324 45 0 0.12185 5
gpt-4-0613 assay 5.47238 45 0 0.121608 5
openhermes-2.5:7:ggufv2:Q4_K_M assay 5.40473 45 0 0.120105 5
openhermes-2.5:7:ggufv2:Q3_K_M assay 4.99329 45 0 0.110962 5
openhermes-2.5:7:ggufv2:Q2_K assay 4.35689 45 0 0.0968198 5
llama-2-chat:7:ggufv2:Q6_K assay 2.34166 27 7.55411e-18 0.0867281 3
llama-2-chat:13:ggufv2:Q6_K assay 2.19772 27 3.77706e-18 0.081397 3
mixtral-instruct-v0.1:46_7:ggufv2:Q3_K_M assay 3.17543 45 0 0.070565 5
llama-2-chat:7:ggufv2:Q8_0 assay 1.62311 27 0 0.0601152 3
llama-2-chat:70:ggufv2:Q4_K_M assay 1.8509 45 0 0.041131 5
llama-2-chat:70:ggufv2:Q5_K_M assay 1.81844 45 0 0.0404097 5
mixtral-instruct-v0.1:46_7:ggufv2:Q2_K assay 1.68419 45 0 0.0374265 5
chatglm3:6:ggmlv3:q4_0 assay 1.61672 45 0 0.0359271 5
code-llama-instruct:7:ggufv2:Q4_K_M assay 1.53778 45 0 0.0341728 5
llama-3-instruct:8:ggufv2:Q6_K assay 1.48103 45 0 0.0329118 5
llama-3-instruct:8:ggufv2:Q8_0 assay 1.37088 45 0 0.0304641 5
mixtral-instruct-v0.1:46_7:ggufv2:Q8_0 assay 1.16327 45 0 0.0258505 5
mixtral-instruct-v0.1:46_7:ggufv2:Q4_K_M assay 1.15926 45 0 0.0257612 5
llama-2-chat:70:ggufv2:Q2_K assay 1.15095 45 0 0.0255768 5
llama-2-chat:70:ggufv2:Q3_K_M assay 1.07788 45 0 0.023953 5
mixtral-instruct-v0.1:46_7:ggufv2:Q5_K_M assay 1.05347 45 0 0.0234104 5
mixtral-instruct-v0.1:46_7:ggufv2:Q6_K assay 1.02909 45 0 0.0228686 5
llama-2-chat:13:ggufv2:Q2_K assay 0.974441 45 0 0.0216542 5
llama-3-instruct:8:ggufv2:Q5_K_M assay 0.922706 45 0 0.0205046 5
llama-2-chat:7:ggufv2:Q5_K_M assay 0.919259 45 0 0.020428 5
llama-2-chat:13:ggufv2:Q5_K_M assay 0.836349 45 0 0.0185855 5
llama-2-chat:13:ggufv2:Q8_0 assay 0.756302 45 0 0.0168067 5
llama-2-chat:13:ggufv2:Q3_K_M assay 0.750557 45 0 0.016679 5
llama-2-chat:13:ggufv2:Q4_K_M assay 0.647223 45 0 0.0143827 5
llama-2-chat:7:ggufv2:Q4_K_M assay 0.604799 45 0 0.01344 5
llama-3-instruct:8:ggufv2:Q4_K_M assay 0.522273 45 0 0.0116061 5
llama-2-chat:7:ggufv2:Q3_K_M assay 0.455699 45 0 0.0101266 5
llama-2-chat:7:ggufv2:Q2_K assay 0.233824 45 0 0.00519608 5
Full model name Subtask Score achieved Score possible Score SD Accuracy Iterations
gpt-4-0613 chemical 6.38889 45 0 0.141975 5
gpt-4-0125-preview chemical 6.22222 45 0 0.138272 5
openhermes-2.5:7:ggufv2:Q6_K chemical 6.16667 45 0 0.137037 5
gpt-4o-2024-05-13 chemical 5.55556 45 0 0.123457 5
gpt-3.5-turbo-0613 chemical 5.44444 45 0 0.120988 5
openhermes-2.5:7:ggufv2:Q3_K_M chemical 5.23309 45 0 0.116291 5
openhermes-2.5:7:ggufv2:Q8_0 chemical 5.16667 45 0 0.114815 5
openhermes-2.5:7:ggufv2:Q5_K_M chemical 5.06667 45 0 0.112593 5
gpt-3.5-turbo-0125 chemical 5.06444 45 0 0.112543 5
openhermes-2.5:7:ggufv2:Q4_K_M chemical 4.95556 45 0 0.110123 5
openhermes-2.5:7:ggufv2:Q2_K chemical 4.66667 45 0 0.103704 5
mistral-instruct-v0.2:7:ggufv2:Q5_K_M chemical 4.02332 45 0 0.0894072 5
mistral-instruct-v0.2:7:ggufv2:Q3_K_M chemical 3.69824 45 0 0.0821832 5
mistral-instruct-v0.2:7:ggufv2:Q6_K chemical 3.5588 45 0 0.0790845 5
mixtral-instruct-v0.1:46_7:ggufv2:Q3_K_M chemical 3.23175 45 0 0.0718166 5
mistral-instruct-v0.2:7:ggufv2:Q2_K chemical 2.9648 45 0 0.0658845 5
mistral-instruct-v0.2:7:ggufv2:Q4_K_M chemical 2.85926 45 0 0.0635392 5
mistral-instruct-v0.2:7:ggufv2:Q8_0 chemical 2.80214 45 0 0.0622698 5
mixtral-instruct-v0.1:46_7:ggufv2:Q6_K chemical 2.28839 45 0 0.050853 5
llama-2-chat:13:ggufv2:Q6_K chemical 1.33748 27 0 0.0495362 3
llama-3-instruct:8:ggufv2:Q6_K chemical 1.99259 45 0 0.0442798 5
llama-3-instruct:8:ggufv2:Q5_K_M chemical 1.98451 45 0 0.0441003 5
llama-3-instruct:8:ggufv2:Q8_0 chemical 1.98451 45 0 0.0441003 5
mixtral-instruct-v0.1:46_7:ggufv2:Q4_K_M chemical 1.92687 45 0 0.0428194 5
llama-2-chat:70:ggufv2:Q2_K chemical 1.92403 45 0 0.0427562 5
llama-2-chat:70:ggufv2:Q4_K_M chemical 1.86594 45 0 0.0414653 5
llama-2-chat:7:ggufv2:Q8_0 chemical 1.11429 27 3.77706e-18 0.0412698 3
llama-2-chat:70:ggufv2:Q5_K_M chemical 1.7972 45 0 0.0399378 5
llama-2-chat:70:ggufv2:Q3_K_M chemical 1.65417 45 0 0.0367593 5
llama-2-chat:13:ggufv2:Q4_K_M chemical 1.60885 45 0 0.0357522 5
llama-2-chat:7:ggufv2:Q6_K chemical 0.85 27 1.88853e-18 0.0314815 3
mixtral-instruct-v0.1:46_7:ggufv2:Q2_K chemical 1.37178 45 0 0.030484 5
mixtral-instruct-v0.1:46_7:ggufv2:Q8_0 chemical 1.02473 45 0 0.0227718 5
mixtral-instruct-v0.1:46_7:ggufv2:Q5_K_M chemical 0.993896 45 0 0.0220866 5
llama-3-instruct:8:ggufv2:Q4_K_M chemical 0.920791 45 0 0.020462 5
chatglm3:6:ggmlv3:q4_0 chemical 0.839293 45 0 0.018651 5
llama-2-chat:7:ggufv2:Q5_K_M chemical 0.580952 45 0 0.0129101 5
llama-2-chat:13:ggufv2:Q5_K_M chemical 0.473978 45 0 0.0105328 5
llama-2-chat:13:ggufv2:Q8_0 chemical 0.473978 45 0 0.0105328 5
llama-2-chat:13:ggufv2:Q3_K_M chemical 0.447004 45 0 0.00993343 5
code-llama-instruct:7:ggufv2:Q4_K_M chemical 0.44189 45 0 0.00981978 5
llama-2-chat:13:ggufv2:Q2_K chemical 0.429118 45 0 0.00953595 5
llama-2-chat:7:ggufv2:Q4_K_M chemical 0.416702 45 0 0.00926004 5
llama-2-chat:7:ggufv2:Q3_K_M chemical 0.270151 45 0 0.00600336 5
llama-2-chat:7:ggufv2:Q2_K chemical 0.264943 45 0 0.00588762 5
Full model name Subtask Score achieved Score possible Score SD Accuracy Iterations
llama-2-chat:7:ggufv2:Q8_0 context 5.70797 27 0 0.211406 3
llama-2-chat:13:ggufv2:Q6_K context 4.88293 27 1.69967e-17 0.180849 3
gpt-4-0613 context 7.90663 45 0 0.175703 5
gpt-4-0125-preview context 7.85253 45 0 0.174501 5
gpt-4o-2024-05-13 context 7.82965 45 0 0.173992 5
gpt-3.5-turbo-0125 context 6.89247 45 0 0.153166 5
openhermes-2.5:7:ggufv2:Q4_K_M context 6.89055 45 0 0.153123 5
openhermes-2.5:7:ggufv2:Q6_K context 6.79989 45 0 0.151109 5
openhermes-2.5:7:ggufv2:Q3_K_M context 6.77271 45 0 0.150505 5
openhermes-2.5:7:ggufv2:Q8_0 context 6.67749 45 0 0.148389 5
gpt-3.5-turbo-0613 context 6.50472 45 0 0.144549 5
openhermes-2.5:7:ggufv2:Q5_K_M context 6.44769 45 0 0.143282 5
llama-2-chat:7:ggufv2:Q6_K context 3.73057 27 0 0.138169 3
mistral-instruct-v0.2:7:ggufv2:Q8_0 context 5.16754 45 0 0.114834 5
mistral-instruct-v0.2:7:ggufv2:Q5_K_M context 5.12599 45 0 0.113911 5
mixtral-instruct-v0.1:46_7:ggufv2:Q5_K_M context 5.02844 45 0 0.111743 5
mistral-instruct-v0.2:7:ggufv2:Q6_K context 5.0158 45 0 0.111462 5
mistral-instruct-v0.2:7:ggufv2:Q2_K context 4.99362 45 0 0.110969 5
mixtral-instruct-v0.1:46_7:ggufv2:Q2_K context 4.51314 45 0 0.100292 5
llama-2-chat:70:ggufv2:Q3_K_M context 4.22332 45 0 0.0938516 5
llama-2-chat:70:ggufv2:Q4_K_M context 4.10284 45 0 0.0911743 5
llama-2-chat:70:ggufv2:Q2_K context 4.08979 45 0 0.0908843 5
mixtral-instruct-v0.1:46_7:ggufv2:Q3_K_M context 4.06318 45 0 0.090293 5
mistral-instruct-v0.2:7:ggufv2:Q4_K_M context 4.01117 45 0 0.0891372 5
mistral-instruct-v0.2:7:ggufv2:Q3_K_M context 3.90982 45 0 0.0868849 5
openhermes-2.5:7:ggufv2:Q2_K context 3.86897 45 0 0.0859772 5
mixtral-instruct-v0.1:46_7:ggufv2:Q4_K_M context 3.79416 45 0 0.0843146 5
llama-2-chat:70:ggufv2:Q5_K_M context 3.74591 45 0 0.0832424 5
mixtral-instruct-v0.1:46_7:ggufv2:Q8_0 context 3.70126 45 0 0.0822502 5
code-llama-instruct:7:ggufv2:Q4_K_M context 3.32657 45 0 0.0739237 5
mixtral-instruct-v0.1:46_7:ggufv2:Q6_K context 3.1452 45 0 0.0698933 5
chatglm3:6:ggmlv3:q4_0 context 2.85636 45 0 0.0634747 5
llama-2-chat:7:ggufv2:Q3_K_M context 2.10857 45 0 0.046857 5
llama-2-chat:7:ggufv2:Q4_K_M context 1.89605 45 0 0.0421345 5
llama-2-chat:13:ggufv2:Q3_K_M context 1.78868 45 0 0.0397484 5
llama-2-chat:13:ggufv2:Q5_K_M context 1.78618 45 0 0.0396929 5
llama-2-chat:13:ggufv2:Q4_K_M context 1.77351 45 0 0.0394113 5
llama-3-instruct:8:ggufv2:Q8_0 context 1.67334 45 0 0.0371853 5
llama-3-instruct:8:ggufv2:Q5_K_M context 1.64821 45 0 0.0366268 5
llama-2-chat:13:ggufv2:Q8_0 context 1.58821 45 0 0.0352936 5
llama-3-instruct:8:ggufv2:Q4_K_M context 1.57169 45 0 0.0349264 5
llama-2-chat:13:ggufv2:Q2_K context 1.34289 45 0 0.0298419 5
llama-2-chat:7:ggufv2:Q5_K_M context 1.23881 45 0 0.0275291 5
llama-2-chat:7:ggufv2:Q2_K context 1.12335 45 0 0.0249632 5
llama-3-instruct:8:ggufv2:Q6_K context 1.10292 45 0 0.0245094 5
Full model name Subtask Score achieved Score possible Score SD Accuracy Iterations
openhermes-2.5:7:ggufv2:Q6_K disease 6.46667 45 0 0.143704 5
openhermes-2.5:7:ggufv2:Q4_K_M disease 6.46667 45 0 0.143704 5
openhermes-2.5:7:ggufv2:Q5_K_M disease 6.46667 45 0 0.143704 5
openhermes-2.5:7:ggufv2:Q8_0 disease 6.46667 45 0 0.143704 5
openhermes-2.5:7:ggufv2:Q3_K_M disease 6.46667 45 0 0.143704 5
gpt-4-0125-preview disease 6.21333 45 0 0.138074 5
gpt-4o-2024-05-13 disease 6.2 45 0 0.137778 5
gpt-4-0613 disease 6.13333 45 0 0.136296 5
gpt-3.5-turbo-0613 disease 6.06667 45 0 0.134815 5
gpt-3.5-turbo-0125 disease 4.75238 45 0 0.105608 5
openhermes-2.5:7:ggufv2:Q2_K disease 4.32493 45 0 0.0961096 5
mistral-instruct-v0.2:7:ggufv2:Q2_K disease 4.20708 45 0 0.0934906 5
mixtral-instruct-v0.1:46_7:ggufv2:Q6_K disease 4.14674 45 0 0.0921497 5
mistral-instruct-v0.2:7:ggufv2:Q5_K_M disease 4.02927 45 0 0.0895392 5
mistral-instruct-v0.2:7:ggufv2:Q6_K disease 4.01581 45 0 0.0892402 5
mistral-instruct-v0.2:7:ggufv2:Q8_0 disease 3.47244 45 0 0.0771654 5
mixtral-instruct-v0.1:46_7:ggufv2:Q4_K_M disease 3.04532 45 0 0.0676737 5
mixtral-instruct-v0.1:46_7:ggufv2:Q5_K_M disease 2.92854 45 0 0.0650787 5
mistral-instruct-v0.2:7:ggufv2:Q4_K_M disease 2.65437 45 0 0.0589859 5
mixtral-instruct-v0.1:46_7:ggufv2:Q8_0 disease 2.57657 45 0 0.057257 5
mistral-instruct-v0.2:7:ggufv2:Q3_K_M disease 2.44785 45 0 0.0543966 5
mixtral-instruct-v0.1:46_7:ggufv2:Q3_K_M disease 2.29171 45 0 0.0509269 5
mixtral-instruct-v0.1:46_7:ggufv2:Q2_K disease 2.29094 45 0 0.0509099 5
llama-3-instruct:8:ggufv2:Q8_0 disease 1.73452 45 0 0.0385449 5
llama-3-instruct:8:ggufv2:Q6_K disease 1.73452 45 0 0.0385449 5
llama-3-instruct:8:ggufv2:Q5_K_M disease 1.73452 45 0 0.0385449 5
llama-2-chat:13:ggufv2:Q6_K disease 0.827524 27 0 0.030649 3
code-llama-instruct:7:ggufv2:Q4_K_M disease 1.33093 45 0 0.0295762 5
chatglm3:6:ggmlv3:q4_0 disease 1.21669 45 0 0.0270376 5
llama-3-instruct:8:ggufv2:Q4_K_M disease 0.995894 45 0 0.022131 5
llama-2-chat:7:ggufv2:Q8_0 disease 0.444887 27 2.36066e-19 0.0164773 3
llama-2-chat:7:ggufv2:Q6_K disease 0.439254 27 0 0.0162687 3
llama-2-chat:13:ggufv2:Q5_K_M disease 0.306386 45 0 0.00680858 5
llama-2-chat:13:ggufv2:Q8_0 disease 0.26663 45 0 0.00592511 5
llama-2-chat:13:ggufv2:Q4_K_M disease 0.250053 45 0 0.00555673 5
llama-2-chat:70:ggufv2:Q5_K_M disease 0.235648 45 0 0.00523663 5
llama-2-chat:7:ggufv2:Q3_K_M disease 0.185035 45 0 0.0041119 5
llama-2-chat:70:ggufv2:Q2_K disease 0.182046 45 0 0.00404548 5
llama-2-chat:70:ggufv2:Q4_K_M disease 0.179398 45 0 0.00398663 5
llama-2-chat:7:ggufv2:Q5_K_M disease 0.150208 45 0 0.00333795 5
llama-2-chat:70:ggufv2:Q3_K_M disease 0.142957 45 0 0.00317683 5
llama-2-chat:13:ggufv2:Q3_K_M disease 0.103277 45 0 0.00229505 5
llama-2-chat:7:ggufv2:Q4_K_M disease 0.0898052 45 0 0.00199567 5
llama-2-chat:13:ggufv2:Q2_K disease 0.0874203 45 0 0.00194267 5
llama-2-chat:7:ggufv2:Q2_K disease 0.0587138 45 0 0.00130475 5
Full model name Subtask Score achieved Score possible Score SD Accuracy Iterations
gpt-4o-2024-05-13 entity 5.9909 45 0 0.133131 5
gpt-4-0125-preview entity 4.59502 45 0 0.102112 5
gpt-3.5-turbo-0613 entity 4.57972 45 0 0.101772 5
openhermes-2.5:7:ggufv2:Q4_K_M entity 4.22461 45 0 0.0938803 5
openhermes-2.5:7:ggufv2:Q8_0 entity 4.1344 45 0 0.0918755 5
gpt-4-0613 entity 4.12852 45 0 0.0917448 5
openhermes-2.5:7:ggufv2:Q6_K entity 4.09333 45 0 0.0909629 5
openhermes-2.5:7:ggufv2:Q5_K_M entity 4.02016 45 0 0.0893369 5
gpt-3.5-turbo-0125 entity 3.71195 45 0 0.0824877 5
openhermes-2.5:7:ggufv2:Q3_K_M entity 3.65819 45 0 0.0812932 5
llama-2-chat:13:ggufv2:Q6_K entity 2.14189 27 0 0.0793293 3
llama-2-chat:7:ggufv2:Q6_K entity 2.07106 27 9.44264e-19 0.0767059 3
llama-2-chat:7:ggufv2:Q8_0 entity 1.79733 27 0 0.0665678 3
mixtral-instruct-v0.1:46_7:ggufv2:Q5_K_M entity 2.42313 45 0 0.0538473 5
openhermes-2.5:7:ggufv2:Q2_K entity 2.33413 45 0 0.0518696 5
mixtral-instruct-v0.1:46_7:ggufv2:Q3_K_M entity 2.30597 45 0 0.0512437 5
mistral-instruct-v0.2:7:ggufv2:Q3_K_M entity 2.20283 45 0 0.0489518 5
mixtral-instruct-v0.1:46_7:ggufv2:Q6_K entity 2.10077 45 0 0.0466838 5
mistral-instruct-v0.2:7:ggufv2:Q4_K_M entity 2.0607 45 0 0.0457934 5
mistral-instruct-v0.2:7:ggufv2:Q8_0 entity 2.00802 45 0 0.0446226 5
mistral-instruct-v0.2:7:ggufv2:Q5_K_M entity 1.99809 45 0 0.044402 5
mistral-instruct-v0.2:7:ggufv2:Q6_K entity 1.99214 45 0 0.0442699 5
mixtral-instruct-v0.1:46_7:ggufv2:Q8_0 entity 1.79999 45 0 0.0399998 5
mistral-instruct-v0.2:7:ggufv2:Q2_K entity 1.77563 45 0 0.0394584 5
chatglm3:6:ggmlv3:q4_0 entity 1.22227 45 0 0.0271617 5
llama-2-chat:70:ggufv2:Q3_K_M entity 1.20851 45 0 0.0268558 5
llama-2-chat:70:ggufv2:Q2_K entity 1.16189 45 0 0.0258197 5
mixtral-instruct-v0.1:46_7:ggufv2:Q4_K_M entity 1.10007 45 0 0.0244461 5
llama-2-chat:70:ggufv2:Q4_K_M entity 1.01555 45 0 0.0225677 5
code-llama-instruct:7:ggufv2:Q4_K_M entity 0.948961 45 0 0.021088 5
llama-2-chat:70:ggufv2:Q5_K_M entity 0.903324 45 0 0.0200739 5
llama-2-chat:13:ggufv2:Q2_K entity 0.807379 45 0 0.0179418 5
llama-2-chat:13:ggufv2:Q4_K_M entity 0.785233 45 0 0.0174496 5
llama-3-instruct:8:ggufv2:Q5_K_M entity 0.75253 45 0 0.0167229 5
llama-3-instruct:8:ggufv2:Q6_K entity 0.749495 45 0 0.0166554 5
llama-2-chat:7:ggufv2:Q3_K_M entity 0.699988 45 0 0.0155553 5
llama-3-instruct:8:ggufv2:Q8_0 entity 0.695524 45 0 0.0154561 5
llama-3-instruct:8:ggufv2:Q4_K_M entity 0.694377 45 0 0.0154306 5
mixtral-instruct-v0.1:46_7:ggufv2:Q2_K entity 0.685368 45 0 0.0152304 5
llama-2-chat:7:ggufv2:Q4_K_M entity 0.685027 45 0 0.0152228 5
llama-2-chat:13:ggufv2:Q8_0 entity 0.629764 45 0 0.0139947 5
llama-2-chat:7:ggufv2:Q5_K_M entity 0.623851 45 0 0.0138634 5
llama-2-chat:13:ggufv2:Q5_K_M entity 0.623813 45 0 0.0138625 5
llama-2-chat:13:ggufv2:Q3_K_M entity 0.56502 45 0 0.012556 5
llama-2-chat:7:ggufv2:Q2_K entity 0.318196 45 0 0.00707101 5
Full model name Subtask Score achieved Score possible Score SD Accuracy Iterations
openhermes-2.5:7:ggufv2:Q2_K experiment_yes_or_no 9 45 0 0.2 5
gpt-4-0125-preview experiment_yes_or_no 9 45 0 0.2 5
llama-2-chat:70:ggufv2:Q4_K_M experiment_yes_or_no 9 45 0 0.2 5
chatglm3:6:ggmlv3:q4_0 experiment_yes_or_no 8.6 45 0 0.191111 5
openhermes-2.5:7:ggufv2:Q5_K_M experiment_yes_or_no 8.33333 45 0 0.185185 5
openhermes-2.5:7:ggufv2:Q6_K experiment_yes_or_no 8.33333 45 0 0.185185 5
openhermes-2.5:7:ggufv2:Q4_K_M experiment_yes_or_no 8.33333 45 0 0.185185 5
llama-2-chat:70:ggufv2:Q5_K_M experiment_yes_or_no 8.025 45 0 0.178333 5
gpt-4o-2024-05-13 experiment_yes_or_no 8 45 0 0.177778 5
openhermes-2.5:7:ggufv2:Q3_K_M experiment_yes_or_no 8 45 0 0.177778 5
gpt-4-0613 experiment_yes_or_no 8 45 0 0.177778 5
gpt-3.5-turbo-0613 experiment_yes_or_no 8 45 0 0.177778 5
openhermes-2.5:7:ggufv2:Q8_0 experiment_yes_or_no 8 45 0 0.177778 5
llama-2-chat:7:ggufv2:Q8_0 experiment_yes_or_no 4.67535 27 0 0.173161 3
llama-2-chat:70:ggufv2:Q2_K experiment_yes_or_no 7.05061 45 0 0.15668 5
llama-2-chat:70:ggufv2:Q3_K_M experiment_yes_or_no 6.07336 45 0 0.134964 5
gpt-3.5-turbo-0125 experiment_yes_or_no 6.03333 45 0 0.134074 5
llama-2-chat:13:ggufv2:Q6_K experiment_yes_or_no 3.25916 27 9.44264e-19 0.12071 3
mistral-instruct-v0.2:7:ggufv2:Q3_K_M experiment_yes_or_no 5.23564 45 0 0.116348 5
llama-2-chat:13:ggufv2:Q3_K_M experiment_yes_or_no 5.16593 45 0 0.114799 5
llama-3-instruct:8:ggufv2:Q8_0 experiment_yes_or_no 3.7 45 0 0.0822222 5
llama-3-instruct:8:ggufv2:Q5_K_M experiment_yes_or_no 3.68182 45 0 0.0818182 5
mistral-instruct-v0.2:7:ggufv2:Q8_0 experiment_yes_or_no 3.32028 45 0 0.073784 5
llama-2-chat:7:ggufv2:Q6_K experiment_yes_or_no 1.97565 27 7.08198e-19 0.0731722 3
mistral-instruct-v0.2:7:ggufv2:Q5_K_M experiment_yes_or_no 3.26963 45 0 0.0726584 5
code-llama-instruct:7:ggufv2:Q4_K_M experiment_yes_or_no 3.0913 45 0 0.0686956 5
llama-3-instruct:8:ggufv2:Q6_K experiment_yes_or_no 2.36364 45 0 0.0525253 5
mistral-instruct-v0.2:7:ggufv2:Q6_K experiment_yes_or_no 2.36015 45 0 0.0524479 5
mistral-instruct-v0.2:7:ggufv2:Q4_K_M experiment_yes_or_no 2.2851 45 0 0.05078 5
mistral-instruct-v0.2:7:ggufv2:Q2_K experiment_yes_or_no 2.2802 45 0 0.0506711 5
llama-2-chat:7:ggufv2:Q4_K_M experiment_yes_or_no 2.06817 45 0 0.0459593 5
llama-3-instruct:8:ggufv2:Q4_K_M experiment_yes_or_no 1.89935 45 0 0.0422078 5
mixtral-instruct-v0.1:46_7:ggufv2:Q3_K_M experiment_yes_or_no 1.45686 45 0 0.0323746 5
mixtral-instruct-v0.1:46_7:ggufv2:Q4_K_M experiment_yes_or_no 1.29991 45 0 0.0288868 5
mixtral-instruct-v0.1:46_7:ggufv2:Q5_K_M experiment_yes_or_no 1.1661 45 0 0.0259134 5
mixtral-instruct-v0.1:46_7:ggufv2:Q6_K experiment_yes_or_no 1.15184 45 0 0.0255965 5
llama-2-chat:13:ggufv2:Q8_0 experiment_yes_or_no 1.06643 45 0 0.0236984 5
llama-2-chat:13:ggufv2:Q5_K_M experiment_yes_or_no 1.03147 45 0 0.0229215 5
mixtral-instruct-v0.1:46_7:ggufv2:Q2_K experiment_yes_or_no 0.785587 45 0 0.0174575 5
llama-2-chat:7:ggufv2:Q3_K_M experiment_yes_or_no 0.726745 45 0 0.0161499 5
llama-2-chat:7:ggufv2:Q5_K_M experiment_yes_or_no 0.618798 45 0 0.0137511 5
llama-2-chat:13:ggufv2:Q4_K_M experiment_yes_or_no 0.468722 45 0 0.010416 5
llama-2-chat:13:ggufv2:Q2_K experiment_yes_or_no 0.267272 45 0 0.00593938 5
mixtral-instruct-v0.1:46_7:ggufv2:Q8_0 experiment_yes_or_no 0.201489 45 0 0.00447753 5
llama-2-chat:7:ggufv2:Q2_K experiment_yes_or_no 0.130285 45 0 0.00289522 5
Full model name Subtask Score achieved Score possible Score SD Accuracy Iterations
llama-2-chat:7:ggufv2:Q8_0 hypothesis 2.854 27 3.77706e-18 0.105704 3
mistral-instruct-v0.2:7:ggufv2:Q4_K_M hypothesis 3.67339 45 0 0.0816309 5
llama-2-chat:7:ggufv2:Q6_K hypothesis 2.01944 27 5.19345e-18 0.074794 3
mistral-instruct-v0.2:7:ggufv2:Q6_K hypothesis 3.33681 45 0 0.0741512 5
gpt-4-0613 hypothesis 3.29696 45 0 0.0732657 5
mistral-instruct-v0.2:7:ggufv2:Q8_0 hypothesis 2.9272 45 0 0.0650489 5
gpt-4o-2024-05-13 hypothesis 2.89512 45 0 0.064336 5
mistral-instruct-v0.2:7:ggufv2:Q5_K_M hypothesis 2.75585 45 0 0.0612411 5
gpt-3.5-turbo-0125 hypothesis 2.72775 45 0 0.0606168 5
llama-2-chat:13:ggufv2:Q6_K hypothesis 1.61253 27 1.88853e-18 0.0597233 3
gpt-3.5-turbo-0613 hypothesis 2.64497 45 0 0.0587771 5
openhermes-2.5:7:ggufv2:Q4_K_M hypothesis 2.57382 45 0 0.0571961 5
mistral-instruct-v0.2:7:ggufv2:Q3_K_M hypothesis 2.47292 45 0 0.0549539 5
openhermes-2.5:7:ggufv2:Q8_0 hypothesis 2.37196 45 0 0.0527103 5
gpt-4-0125-preview hypothesis 2.33518 45 0 0.051893 5
openhermes-2.5:7:ggufv2:Q6_K hypothesis 2.29085 45 0 0.0509077 5
mixtral-instruct-v0.1:46_7:ggufv2:Q3_K_M hypothesis 2.23255 45 0 0.0496122 5
openhermes-2.5:7:ggufv2:Q3_K_M hypothesis 2.09626 45 0 0.0465835 5
mistral-instruct-v0.2:7:ggufv2:Q2_K hypothesis 2.05375 45 0 0.045639 5
mixtral-instruct-v0.1:46_7:ggufv2:Q4_K_M hypothesis 1.87442 45 0 0.0416537 5
mixtral-instruct-v0.1:46_7:ggufv2:Q8_0 hypothesis 1.83735 45 0 0.04083 5
mixtral-instruct-v0.1:46_7:ggufv2:Q5_K_M hypothesis 1.71557 45 0 0.0381237 5
openhermes-2.5:7:ggufv2:Q5_K_M hypothesis 1.52181 45 0 0.033818 5
openhermes-2.5:7:ggufv2:Q2_K hypothesis 1.4915 45 0 0.0331444 5
llama-2-chat:70:ggufv2:Q3_K_M hypothesis 1.44143 45 0 0.0320317 5
mixtral-instruct-v0.1:46_7:ggufv2:Q2_K hypothesis 1.44009 45 0 0.032002 5
llama-2-chat:70:ggufv2:Q2_K hypothesis 1.4389 45 0 0.0319755 5
llama-2-chat:70:ggufv2:Q4_K_M hypothesis 1.41421 45 0 0.0314268 5
mixtral-instruct-v0.1:46_7:ggufv2:Q6_K hypothesis 1.39565 45 0 0.0310144 5
llama-3-instruct:8:ggufv2:Q4_K_M hypothesis 1.13596 45 0 0.0252436 5
chatglm3:6:ggmlv3:q4_0 hypothesis 0.98676 45 0 0.021928 5
llama-3-instruct:8:ggufv2:Q8_0 hypothesis 0.878406 45 0 0.0195201 5
llama-3-instruct:8:ggufv2:Q6_K hypothesis 0.876219 45 0 0.0194715 5
llama-2-chat:7:ggufv2:Q5_K_M hypothesis 0.68638 45 0 0.0152529 5
llama-2-chat:70:ggufv2:Q5_K_M hypothesis 0.623758 45 0 0.0138613 5
llama-2-chat:7:ggufv2:Q4_K_M hypothesis 0.62053 45 0 0.0137896 5
llama-3-instruct:8:ggufv2:Q5_K_M hypothesis 0.604423 45 0 0.0134316 5
code-llama-instruct:7:ggufv2:Q4_K_M hypothesis 0.572369 45 0 0.0127193 5
llama-2-chat:13:ggufv2:Q8_0 hypothesis 0.55524 45 0 0.0123387 5
llama-2-chat:7:ggufv2:Q2_K hypothesis 0.520453 45 0 0.0115656 5
llama-2-chat:13:ggufv2:Q2_K hypothesis 0.49279 45 0 0.0109509 5
llama-2-chat:13:ggufv2:Q3_K_M hypothesis 0.424638 45 0 0.00943639 5
llama-2-chat:13:ggufv2:Q5_K_M hypothesis 0.408017 45 0 0.00906704 5
llama-2-chat:7:ggufv2:Q3_K_M hypothesis 0.402337 45 0 0.00894082 5
llama-2-chat:13:ggufv2:Q4_K_M hypothesis 0.366299 45 0 0.00813997 5
Full model name Subtask Score achieved Score possible Score SD Accuracy Iterations
gpt-4o-2024-05-13 intervention 5.34631 45 0 0.118807 5
openhermes-2.5:7:ggufv2:Q4_K_M intervention 4.9841 45 0 0.110758 5
gpt-4-0125-preview intervention 4.92171 45 0 0.109371 5
gpt-4-0613 intervention 4.72253 45 0 0.104945 5
openhermes-2.5:7:ggufv2:Q6_K intervention 4.71449 45 0 0.104767 5
openhermes-2.5:7:ggufv2:Q8_0 intervention 4.44465 45 0 0.09877 5
gpt-3.5-turbo-0613 intervention 4.27143 45 0 0.0949206 5
openhermes-2.5:7:ggufv2:Q5_K_M intervention 4.00021 45 0 0.0888935 5
gpt-3.5-turbo-0125 intervention 3.75141 45 0 0.0833647 5
openhermes-2.5:7:ggufv2:Q3_K_M intervention 3.55238 45 0 0.0789418 5
openhermes-2.5:7:ggufv2:Q2_K intervention 2.92766 45 0 0.0650591 5
llama-2-chat:7:ggufv2:Q8_0 intervention 1.56417 27 0 0.0579323 3
mixtral-instruct-v0.1:46_7:ggufv2:Q6_K intervention 2.23683 45 0 0.0497073 5
mixtral-instruct-v0.1:46_7:ggufv2:Q5_K_M intervention 2.23319 45 0 0.0496264 5
llama-2-chat:13:ggufv2:Q6_K intervention 1.13241 27 0.000274145 0.0419412 3
mistral-instruct-v0.2:7:ggufv2:Q3_K_M intervention 1.66677 45 0 0.0370393 5
mistral-instruct-v0.2:7:ggufv2:Q4_K_M intervention 1.23412 45 0 0.0274249 5
code-llama-instruct:7:ggufv2:Q4_K_M intervention 1.17173 45 0 0.0260384 5
mistral-instruct-v0.2:7:ggufv2:Q5_K_M intervention 1.15754 45 0 0.025723 5
llama-2-chat:13:ggufv2:Q4_K_M intervention 1.02157 45 0 0.0227015 5
mistral-instruct-v0.2:7:ggufv2:Q8_0 intervention 0.987919 45 0 0.0219538 5
chatglm3:6:ggmlv3:q4_0 intervention 0.881806 45 0 0.0195957 5
mixtral-instruct-v0.1:46_7:ggufv2:Q8_0 intervention 0.879646 45 0 0.0195477 5
llama-2-chat:7:ggufv2:Q6_K intervention 0.514286 27 3.77706e-18 0.0190476 3
mixtral-instruct-v0.1:46_7:ggufv2:Q4_K_M intervention 0.723791 45 0 0.0160842 5
mistral-instruct-v0.2:7:ggufv2:Q2_K intervention 0.680182 45 0 0.0151152 5
llama-2-chat:70:ggufv2:Q2_K intervention 0.668995 45 0 0.0148666 5
mistral-instruct-v0.2:7:ggufv2:Q6_K intervention 0.640258 45 0 0.0142279 5
mixtral-instruct-v0.1:46_7:ggufv2:Q3_K_M intervention 0.550643 45 0 0.0122365 5
llama-2-chat:70:ggufv2:Q5_K_M intervention 0.542302 45 0 0.0120512 5
llama-2-chat:13:ggufv2:Q2_K intervention 0.502722 45 0 0.0111716 5
llama-2-chat:70:ggufv2:Q4_K_M intervention 0.417501 45 0 0.00927779 5
llama-2-chat:7:ggufv2:Q3_K_M intervention 0.416756 45 0 0.00926124 5
llama-3-instruct:8:ggufv2:Q5_K_M intervention 0.410888 45 0 0.00913085 5
llama-2-chat:70:ggufv2:Q3_K_M intervention 0.402319 45 0 0.00894042 5
llama-3-instruct:8:ggufv2:Q4_K_M intervention 0.37923 45 0 0.00842733 5
llama-2-chat:13:ggufv2:Q5_K_M intervention 0.339683 45 0 0.0075485 5
llama-3-instruct:8:ggufv2:Q6_K intervention 0.327257 45 0 0.00727237 5
llama-3-instruct:8:ggufv2:Q8_0 intervention 0.319187 45 0 0.00709304 5
llama-2-chat:13:ggufv2:Q3_K_M intervention 0.265476 45 0 0.00589947 5
llama-2-chat:7:ggufv2:Q5_K_M intervention 0.24986 45 0 0.00555244 5
llama-2-chat:13:ggufv2:Q8_0 intervention 0.244444 45 0 0.0054321 5
mixtral-instruct-v0.1:46_7:ggufv2:Q2_K intervention 0.2273 45 0 0.0050511 5
llama-2-chat:7:ggufv2:Q2_K intervention 0.118691 45 0 0.00263758 5
llama-2-chat:7:ggufv2:Q4_K_M intervention 0.0769231 45 0 0.0017094 5
Full model name Subtask Score achieved Score possible Score SD Accuracy Iterations
gpt-4-0125-preview ncbi_link 6.48768 45 0 0.144171 5
gpt-4-0613 ncbi_link 6.05933 45 0 0.134652 5
openhermes-2.5:7:ggufv2:Q8_0 ncbi_link 3.5303 45 0 0.0784512 5
openhermes-2.5:7:ggufv2:Q6_K ncbi_link 3.5303 45 0 0.0784512 5
gpt-4o-2024-05-13 ncbi_link 3.51302 45 0 0.078067 5
openhermes-2.5:7:ggufv2:Q5_K_M ncbi_link 3.47436 45 0 0.077208 5
openhermes-2.5:7:ggufv2:Q4_K_M ncbi_link 3.11111 45 0 0.0691358 5
openhermes-2.5:7:ggufv2:Q3_K_M ncbi_link 2.37436 45 0 0.0527635 5
gpt-3.5-turbo-0613 ncbi_link 2.16667 45 0 0.0481481 5
gpt-3.5-turbo-0125 ncbi_link 1.42925 45 0 0.031761 5
llama-2-chat:13:ggufv2:Q6_K ncbi_link 0.690904 27 0 0.0255891 3
mixtral-instruct-v0.1:46_7:ggufv2:Q4_K_M ncbi_link 1.03429 45 0 0.0229841 5
mixtral-instruct-v0.1:46_7:ggufv2:Q5_K_M ncbi_link 0.884957 45 0 0.0196657 5
mistral-instruct-v0.2:7:ggufv2:Q2_K ncbi_link 0.881705 45 0 0.0195934 5
llama-2-chat:7:ggufv2:Q6_K ncbi_link 0.507313 27 9.44264e-19 0.0187894 3
mistral-instruct-v0.2:7:ggufv2:Q5_K_M ncbi_link 0.710989 45 0 0.0157998 5
llama-2-chat:7:ggufv2:Q8_0 ncbi_link 0.410766 27 9.44264e-19 0.0152135 3
mixtral-instruct-v0.1:46_7:ggufv2:Q2_K ncbi_link 0.656812 45 0 0.0145958 5
mistral-instruct-v0.2:7:ggufv2:Q4_K_M ncbi_link 0.615714 45 0 0.0136825 5
mistral-instruct-v0.2:7:ggufv2:Q8_0 ncbi_link 0.596131 45 0 0.0132474 5
mistral-instruct-v0.2:7:ggufv2:Q3_K_M ncbi_link 0.574422 45 0 0.0127649 5
mistral-instruct-v0.2:7:ggufv2:Q6_K ncbi_link 0.558824 45 0 0.0124183 5
openhermes-2.5:7:ggufv2:Q2_K ncbi_link 0.505458 45 0 0.0112324 5
mixtral-instruct-v0.1:46_7:ggufv2:Q8_0 ncbi_link 0.429927 45 0 0.00955394 5
code-llama-instruct:7:ggufv2:Q4_K_M ncbi_link 0.328564 45 0 0.00730142 5
mixtral-instruct-v0.1:46_7:ggufv2:Q6_K ncbi_link 0.271548 45 0 0.0060344 5
llama-2-chat:13:ggufv2:Q8_0 ncbi_link 0.255217 45 0 0.00567148 5
llama-2-chat:70:ggufv2:Q2_K ncbi_link 0.253735 45 0 0.00563856 5
llama-2-chat:13:ggufv2:Q4_K_M ncbi_link 0.246231 45 0 0.00547179 5
llama-2-chat:70:ggufv2:Q4_K_M ncbi_link 0.241357 45 0 0.00536348 5
llama-2-chat:13:ggufv2:Q5_K_M ncbi_link 0.236802 45 0 0.00526226 5
llama-3-instruct:8:ggufv2:Q4_K_M ncbi_link 0.233815 45 0 0.00519589 5
mixtral-instruct-v0.1:46_7:ggufv2:Q3_K_M ncbi_link 0.230909 45 0 0.00513131 5
llama-2-chat:7:ggufv2:Q4_K_M ncbi_link 0.216341 45 0 0.00480757 5
llama-2-chat:70:ggufv2:Q5_K_M ncbi_link 0.196981 45 0 0.00437735 5
llama-2-chat:13:ggufv2:Q2_K ncbi_link 0.192574 45 0 0.00427942 5
llama-3-instruct:8:ggufv2:Q8_0 ncbi_link 0.179211 45 0 0.00398247 5
llama-2-chat:7:ggufv2:Q3_K_M ncbi_link 0.177339 45 0 0.00394087 5
llama-3-instruct:8:ggufv2:Q6_K ncbi_link 0.173014 45 0 0.00384476 5
llama-2-chat:7:ggufv2:Q5_K_M ncbi_link 0.170952 45 0 0.00379894 5
llama-2-chat:70:ggufv2:Q3_K_M ncbi_link 0.166777 45 0 0.00370615 5
llama-3-instruct:8:ggufv2:Q5_K_M ncbi_link 0.166614 45 0 0.00370254 5
llama-2-chat:7:ggufv2:Q2_K ncbi_link 0.15271 45 0 0.00339354 5
llama-2-chat:13:ggufv2:Q3_K_M ncbi_link 0.150011 45 0 0.00333359 5
chatglm3:6:ggmlv3:q4_0 ncbi_link 0.122857 45 0 0.00273017 5
Full model name Subtask Score achieved Score possible Score SD Accuracy Iterations
gpt-4-0613 significance 5.6 45 0 0.124444 5
gpt-4-0125-preview significance 5.18384 45 0 0.115196 5
gpt-4o-2024-05-13 significance 4.22424 45 0 0.0938721 5
openhermes-2.5:7:ggufv2:Q4_K_M significance 3.92996 45 0 0.0873325 5
openhermes-2.5:7:ggufv2:Q6_K significance 3.78182 45 0 0.0840404 5
openhermes-2.5:7:ggufv2:Q8_0 significance 3.78182 45 0 0.0840404 5
openhermes-2.5:7:ggufv2:Q5_K_M significance 3.77787 45 0 0.0839526 5
openhermes-2.5:7:ggufv2:Q3_K_M significance 3.69091 45 0 0.0820202 5
gpt-3.5-turbo-0613 significance 3.58562 45 0 0.0796804 5
gpt-3.5-turbo-0125 significance 3.51717 45 0 0.0781594 5
mistral-instruct-v0.2:7:ggufv2:Q4_K_M significance 2.93833 45 0 0.0652963 5
mistral-instruct-v0.2:7:ggufv2:Q6_K significance 2.87928 45 0 0.063984 5
mistral-instruct-v0.2:7:ggufv2:Q3_K_M significance 2.79423 45 0 0.062094 5
mistral-instruct-v0.2:7:ggufv2:Q8_0 significance 2.62296 45 0 0.0582881 5
mistral-instruct-v0.2:7:ggufv2:Q5_K_M significance 2.56724 45 0 0.0570498 5
openhermes-2.5:7:ggufv2:Q2_K significance 2.48514 45 0 0.0552254 5
mistral-instruct-v0.2:7:ggufv2:Q2_K significance 2.4813 45 0 0.05514 5
llama-2-chat:7:ggufv2:Q8_0 significance 1.10159 27 0 0.0407996 3
llama-2-chat:13:ggufv2:Q6_K significance 1.07015 27 1.0623e-18 0.0396352 3
mixtral-instruct-v0.1:46_7:ggufv2:Q8_0 significance 1.50696 45 0 0.033488 5
mixtral-instruct-v0.1:46_7:ggufv2:Q6_K significance 1.34869 45 0 0.0299709 5
llama-2-chat:7:ggufv2:Q6_K significance 0.806474 27 0 0.0298694 3
mixtral-instruct-v0.1:46_7:ggufv2:Q5_K_M significance 1.31454 45 0 0.0292119 5
mixtral-instruct-v0.1:46_7:ggufv2:Q3_K_M significance 1.2312 45 0 0.0273599 5
mixtral-instruct-v0.1:46_7:ggufv2:Q4_K_M significance 1.01129 45 0 0.0224731 5
llama-3-instruct:8:ggufv2:Q6_K significance 0.994971 45 0 0.0221105 5
llama-3-instruct:8:ggufv2:Q8_0 significance 0.957259 45 0 0.0212724 5
llama-2-chat:70:ggufv2:Q3_K_M significance 0.758379 45 0 0.0168529 5
llama-2-chat:70:ggufv2:Q2_K significance 0.716547 45 0 0.0159233 5
llama-2-chat:70:ggufv2:Q4_K_M significance 0.68386 45 0 0.0151969 5
llama-3-instruct:8:ggufv2:Q5_K_M significance 0.636128 45 0 0.0141362 5
llama-2-chat:70:ggufv2:Q5_K_M significance 0.518572 45 0 0.0115238 5
llama-2-chat:7:ggufv2:Q4_K_M significance 0.329457 45 0 0.00732127 5
llama-2-chat:13:ggufv2:Q8_0 significance 0.326026 45 0 0.00724502 5
llama-2-chat:7:ggufv2:Q5_K_M significance 0.281188 45 0 0.00624862 5
llama-3-instruct:8:ggufv2:Q4_K_M significance 0.228461 45 0 0.00507691 5
llama-2-chat:13:ggufv2:Q4_K_M significance 0.213246 45 0 0.0047388 5
llama-2-chat:13:ggufv2:Q2_K significance 0.207957 45 0 0.00462127 5
llama-2-chat:13:ggufv2:Q5_K_M significance 0.205271 45 0 0.00456158 5
llama-2-chat:7:ggufv2:Q3_K_M significance 0.194946 45 0 0.00433214 5
mixtral-instruct-v0.1:46_7:ggufv2:Q2_K significance 0.178078 45 0 0.00395728 5
llama-2-chat:13:ggufv2:Q3_K_M significance 0.131484 45 0 0.00292186 5
code-llama-instruct:7:ggufv2:Q4_K_M significance 0.123914 45 0 0.00275365 5
chatglm3:6:ggmlv3:q4_0 significance 0.118153 45 0 0.00262562 5
llama-2-chat:7:ggufv2:Q2_K significance 0.103278 45 0 0.00229507 5
Full model name Subtask Score achieved Score possible Score SD Accuracy Iterations
gpt-4-0125-preview stats 8.86667 45 0 0.197037 5
openhermes-2.5:7:ggufv2:Q8_0 stats 8.66667 45 0 0.192593 5
openhermes-2.5:7:ggufv2:Q6_K stats 8.66667 45 0 0.192593 5
openhermes-2.5:7:ggufv2:Q5_K_M stats 8.52821 45 0 0.189516 5
gpt-4-0613 stats 8.51282 45 0 0.189174 5
gpt-4o-2024-05-13 stats 8.51282 45 0 0.189174 5
openhermes-2.5:7:ggufv2:Q4_K_M stats 8.25641 45 0 0.183476 5
openhermes-2.5:7:ggufv2:Q3_K_M stats 8.05641 45 0 0.179031 5
openhermes-2.5:7:ggufv2:Q2_K stats 8 45 0 0.177778 5
gpt-3.5-turbo-0613 stats 7.98135 45 0 0.177363 5
gpt-3.5-turbo-0125 stats 7.12976 45 0 0.158439 5
mistral-instruct-v0.2:7:ggufv2:Q5_K_M stats 6.89091 45 0 0.153131 5
llama-2-chat:13:ggufv2:Q6_K stats 4.05299 27 2.26623e-17 0.150111 3
llama-2-chat:7:ggufv2:Q8_0 stats 3.87128 27 7.55411e-18 0.143381 3
mistral-instruct-v0.2:7:ggufv2:Q4_K_M stats 6.29908 45 0 0.13998 5
mistral-instruct-v0.2:7:ggufv2:Q6_K stats 6.18322 45 0 0.137405 5
llama-3-instruct:8:ggufv2:Q8_0 stats 5.17406 45 0 0.114979 5
mistral-instruct-v0.2:7:ggufv2:Q3_K_M stats 5.1041 45 0 0.113424 5
mistral-instruct-v0.2:7:ggufv2:Q8_0 stats 5.04591 45 0 0.112131 5
llama-2-chat:7:ggufv2:Q6_K stats 2.99816 27 7.55411e-18 0.111043 3
mistral-instruct-v0.2:7:ggufv2:Q2_K stats 4.63496 45 0 0.102999 5
llama-3-instruct:8:ggufv2:Q6_K stats 4.30739 45 0 0.0957198 5
llama-3-instruct:8:ggufv2:Q5_K_M stats 3.9346 45 0 0.0874356 5
mixtral-instruct-v0.1:46_7:ggufv2:Q8_0 stats 3.60737 45 0 0.0801638 5
mixtral-instruct-v0.1:46_7:ggufv2:Q5_K_M stats 3.58841 45 0 0.0797425 5
mixtral-instruct-v0.1:46_7:ggufv2:Q6_K stats 3.21213 45 0 0.0713807 5
llama-2-chat:70:ggufv2:Q4_K_M stats 3.08109 45 0 0.0684688 5
llama-3-instruct:8:ggufv2:Q4_K_M stats 2.98843 45 0 0.0664096 5
llama-2-chat:70:ggufv2:Q2_K stats 2.65216 45 0 0.0589368 5
llama-2-chat:70:ggufv2:Q3_K_M stats 2.44276 45 0 0.0542835 5
llama-2-chat:70:ggufv2:Q5_K_M stats 2.3993 45 0 0.0533177 5
mixtral-instruct-v0.1:46_7:ggufv2:Q4_K_M stats 2.21549 45 0 0.049233 5
mixtral-instruct-v0.1:46_7:ggufv2:Q3_K_M stats 1.96241 45 0 0.0436091 5
mixtral-instruct-v0.1:46_7:ggufv2:Q2_K stats 1.76057 45 0 0.0391237 5
llama-2-chat:7:ggufv2:Q4_K_M stats 1.43589 45 0 0.0319086 5
llama-2-chat:13:ggufv2:Q4_K_M stats 1.41695 45 0 0.0314878 5
llama-2-chat:13:ggufv2:Q8_0 stats 1.38608 45 0 0.0308019 5
llama-2-chat:7:ggufv2:Q5_K_M stats 1.3859 45 0 0.0307977 5
llama-2-chat:13:ggufv2:Q3_K_M stats 1.35834 45 0 0.0301854 5
llama-2-chat:13:ggufv2:Q5_K_M stats 1.3371 45 0 0.0297134 5
llama-2-chat:13:ggufv2:Q2_K stats 1.12439 45 0 0.0249865 5
code-llama-instruct:7:ggufv2:Q4_K_M stats 0.860471 45 0 0.0191216 5
llama-2-chat:7:ggufv2:Q3_K_M stats 0.804538 45 0 0.0178786 5
llama-2-chat:7:ggufv2:Q2_K stats 0.558031 45 0 0.0124007 5
chatglm3:6:ggmlv3:q4_0 stats 0.17925 45 0 0.00398332 5

Stripplot Extraction Subtask