Benchmark · Knowledge

Artificial Analysis · Quality Index

Updated 2026-04-07

Models tested

Top score

57.2

Gemini 3.1 Pro Preview

Median

32.1

min 7.7

Top-5 spread

σ 2.1

competitive

The Frontier

Best score over time · one chart, every benchmark

Chart type

Frontier on Artificial Analysis · Quality Index rose from 10.4 to 57.2 in 14 months · +46.8 points · latest leader Gemini 3.1 Pro Preview from Google DeepMind.

Pink dots = frontier records · 10 totalClick to open model page

Distribution

Where models cluster

Correlated benchmarks

Pearson r · original research

Correlation analysis

Benchmarks that track with Artificial Analysis · Quality Index

Pearson correlation across models scored on both benchmarks. Closer to 1 = strongly predictive.

Artificial Analysis · Coding IndexKnowledge

+0.97

66 shared

HELM · WildBenchKnowledge

+0.96

5 shared

Artificial Analysis · Agentic IndexKnowledge

FrontierMath-2025-02-28-PrivateMath

+0.94

12 shared

Full rankings

68 models tested · sorted by score

#	Model	Score	Price
1	Gemini 3.1 Pro Preview· Google DeepMind	57.2	$2.00
2	GPT-5.4· OpenAI	57.2	$2.50
3	GPT-5.3-Codex· OpenAI	54.0	$1.75
4	Claude Opus 4.6 (Fast)· Anthropic	53.0	$30.00
5	U Muse Spark· Unknown	52.1	—
6	Claude Sonnet 4.6· Anthropic	51.7	$3.00
7	GLM 5.1· z-ai	51.4	$0.95
8	Qwen3.6 Plus· Alibaba Qwen	50.0	$0.33
9	GLM 5 Turbo· z-ai	49.8	$1.20
10	MiniMax M2.7· minimax	49.6	$0.30
11	MiMo-V2-Pro· xiaomi	49.2	$1.00
12	GPT-5.4 Mini· OpenAI	48.1	$0.75
13	Kimi K2.5· moonshotai	46.8	$0.38
14	Gemini 3 Flash Preview· Google DeepMind	46.4	$0.50
15	Qwen3.5 397B A17B· Alibaba Qwen	45.0	$0.39
16	GPT-5.4 Nano· OpenAI	44.4	$0.20
17	MiMo-V2-Omni· xiaomi	43.4	$0.40
18	GLM 5V Turbo· z-ai	42.9	$1.20
19	Qwen3.5-27B· Alibaba Qwen	42.1	$0.20
20	DeepSeek V3.2· DeepSeek	41.7	$0.26
21	Qwen3.5-122B-A10B· Alibaba Qwen	41.6	$0.26
22	MiMo-V2-Flash· xiaomi	41.5	$0.09
23	Gemini 3 Pro· Google DeepMind	41.3	—
24	Qwen3 Max Thinking· Alibaba Qwen	39.9	$0.78
25	Gemma 4 31B (free)· Google DeepMind	39.2	$0.00
26	Grok 4.1 Fast· xAI	38.6	$0.20
27	o3· OpenAI	38.4	$2.00
28	Step 3.5 Flash· stepfun	37.8	$0.10
29	Qwen3.5-35B-A3B· Alibaba Qwen	37.1	$0.16
30	Gemini 2.5 Pro· Google DeepMind	34.6	$1.25
31	Gemini 3.1 Flash Lite Preview· Google DeepMind	33.5	$0.25
32	gpt-oss-120b (free)· OpenAI	33.3	$0.00
33	Mercury 2· inception	32.8	$0.25
34	Qwen3.5-9B· Alibaba Qwen	32.4	$0.05
35	Trinity Large Thinking· arcee-ai	31.9	$0.22
36	Gemma 4 26B A4B (free)· Google DeepMind	31.2	$0.00
37	DeepSeek V3.2 Speciale· DeepSeek	29.4	$0.40
38	Grok Code Fast 1· xAI	28.7	$0.20
39	Qwen3 Coder Next· Alibaba Qwen	28.3	$0.15
40	Mistral Small 4· Mistral AI	27.2	$0.15
41	Qwen3.5 4B· Alibaba	27.1	—
42	R1 0528· DeepSeek	27.1	$0.50
43	Qwen3 Next 80B A3B Instruct (free)· Alibaba Qwen	26.7	$0.00
44	Solar Pro 3· upstage	25.9	$0.15
45	Qwen3 Coder 480B A35B (free)· Alibaba Qwen	24.8	$0.00
46	gpt-oss-20b (free)· OpenAI	24.5	$0.00
47	INTELLECT-3· prime-intellect	22.2	$0.20
48	Gemini 2.5 Flash Lite· Google DeepMind	21.6	$0.10
49	Mistral Medium 3.1· Mistral AI	21.3	$0.40
50	Qwen3 Next 80B A3B Instruct· Alibaba Qwen	20.1	$0.09
51	Llama 4 Maverick· Meta	18.4	$0.15
52	Qwen3.5 2B· Alibaba	16.3	—
53	N Nanbeige4.1 3B· Nanbeige	16.1	—
54	R1 Distill Llama 70B· DeepSeek	15.9	$0.70
55	Llama 3.1 Nemotron Ultra 253B v1· NVIDIA	15.0	$0.60
56	ERNIE 4.5 300B A47B · baidu	15.0	$0.28
57	NVIDIA Nemotron Nano 9B V2· NVIDIA	14.8	—
58	Llama 4 Scout· Meta	13.5	$0.08
59	Command A· Cohere	13.5	$2.50
60	Qwen3.5 0.8B· Alibaba	10.5	—
61	LFM2-24B-A2B· liquid	10.5	$0.03
62	Phi 4· Microsoft	10.4	$0.07
63	Phi 4 Multimodal Instruct· Microsoft	10.0	—
64	Reka Flash 3· rekaai	9.5	$0.10
65	Phi 4 Mini Instruct· Microsoft	8.4	—
66	LFM2.5-1.2B-Thinking (free)· liquid	8.1	$0.00
67	LFM2.5-1.2B-Instruct (free)· liquid	8.0	$0.00
68	Granite 4.0 Micro· ibm-granite	7.7	$0.02

Frequently asked

Pulled from the Artificial Analysis · Quality Index dataset · updated daily

What does Artificial Analysis · Quality Index measure?

Artificial Analysis · Quality Index is a knowledge benchmark in the BenchGecko catalog. 68 AI models have been tested on it. Scores range from 7.7 to 57.2 out of 60.

Which model leads on Artificial Analysis · Quality Index?

Gemini 3.1 Pro Preview from Google DeepMind leads Artificial Analysis · Quality Index with a score of 57.2. The median score across 68 tested models is 32.1.

Is Artificial Analysis · Quality Index saturated?

Yes · the top model on Artificial Analysis · Quality Index has reached 57.2 out of 60, within 5% of the theoretical ceiling. This benchmark is approaching saturation and may be replaced by a harder successor.

Does Artificial Analysis · Quality Index predict performance on other benchmarks?

Yes · Artificial Analysis · Quality Index scores correlate 0.97 with Artificial Analysis · Coding Index across 66 shared models. Models that do well on Artificial Analysis · Quality Index tend to do well on Artificial Analysis · Coding Index.

How often is Artificial Analysis · Quality Index data refreshed?

BenchGecko pulls updates daily. New model scores on Artificial Analysis · Quality Index appear as soon as they are published by Epoch AI or the model provider.