Beta
Benchmark · Knowledge

Artificial Analysis · Quality Index

Updated 2026-04-07
Models tested
68
Top score
57.2
Gemini 3.1 Pro Preview
Median
32.1
min 7.7
Top-5 spread
σ 2.1
competitive

Best score over time · one chart, every benchmark

ARTIFICIAL ANALYSIS · QUALITY INDEX66 MODELS · FRONTIER RUNNING MAX015304560SCORE ↑Jan 25May 25Aug 25Dec 25Apr 26RELEASE DATE →benchgecko.ai/benchmark/aa-quality-index · frontier
Frontier on Artificial Analysis · Quality Index rose from 10.4 to 57.2 in 14 months · +46.8 points · latest leader Gemini 3.1 Pro Preview from Google DeepMind.
Pink dots = frontier records · 10 totalClick to open model page

Where models cluster

SCORE DISTRIBUTION0–696–12812–18518–241024–30730–361036–42742–481048–54254–60MEDIAN · 32.1SCORE BUCKET → (0 TO 60)MODELSbenchgecko.ai

Pearson r · original research

Correlation analysis

Benchmarks that track with Artificial Analysis · Quality Index

Pearson correlation across models scored on both benchmarks. Closer to 1 = strongly predictive.

68 models tested · sorted by score

#ModelScore
1Google DeepMind logoGemini 3.1 Pro Preview57.2
2OpenAI logoGPT-5.457.2
3OpenAI logoGPT-5.3-Codex54.0
4Anthropic logoClaude Opus 4.6 (Fast)53.0
5
U
Muse Spark
52.1
6Anthropic logoClaude Sonnet 4.651.7
7z-ai logoGLM 5.151.4
8Alibaba Qwen logoQwen3.6 Plus50.0
9z-ai logoGLM 5 Turbo49.8
10minimax logoMiniMax M2.749.6
11xiaomi logoMiMo-V2-Pro49.2
12OpenAI logoGPT-5.4 Mini48.1
13moonshotai logoKimi K2.546.8
14Google DeepMind logoGemini 3 Flash Preview46.4
15Alibaba Qwen logoQwen3.5 397B A17B45.0
16OpenAI logoGPT-5.4 Nano44.4
17xiaomi logoMiMo-V2-Omni43.4
18z-ai logoGLM 5V Turbo42.9
19Alibaba Qwen logoQwen3.5-27B42.1
20DeepSeek logoDeepSeek V3.241.7
21Alibaba Qwen logoQwen3.5-122B-A10B41.6
22xiaomi logoMiMo-V2-Flash41.5
23Google DeepMind logoGemini 3 Pro41.3
24Alibaba Qwen logoQwen3 Max Thinking39.9
25Google DeepMind logoGemma 4 31B (free)39.2
26xAI logoGrok 4.1 Fast38.6
27OpenAI logoo338.4
28stepfun logoStep 3.5 Flash37.8
29Alibaba Qwen logoQwen3.5-35B-A3B37.1
30Google DeepMind logoGemini 2.5 Pro34.6
31Google DeepMind logoGemini 3.1 Flash Lite Preview33.5
32OpenAI logogpt-oss-120b (free)33.3
33inception logoMercury 232.8
34Alibaba Qwen logoQwen3.5-9B32.4
35arcee-ai logoTrinity Large Thinking31.9
36Google DeepMind logoGemma 4 26B A4B (free)31.2
37DeepSeek logoDeepSeek V3.2 Speciale29.4
38xAI logoGrok Code Fast 128.7
39Alibaba Qwen logoQwen3 Coder Next28.3
40Mistral AI logoMistral Small 427.2
41Alibaba logoQwen3.5 4B27.1
42DeepSeek logoR1 052827.1
43Alibaba Qwen logoQwen3 Next 80B A3B Instruct (free)26.7
44upstage logoSolar Pro 325.9
45Alibaba Qwen logoQwen3 Coder 480B A35B (free)24.8
46OpenAI logogpt-oss-20b (free)24.5
47prime-intellect logoINTELLECT-322.2
48Google DeepMind logoGemini 2.5 Flash Lite21.6
49Mistral AI logoMistral Medium 3.121.3
50Alibaba Qwen logoQwen3 Next 80B A3B Instruct20.1
51Meta logoLlama 4 Maverick18.4
52Alibaba logoQwen3.5 2B16.3
53
N
Nanbeige4.1 3B
16.1
54DeepSeek logoR1 Distill Llama 70B15.9
55NVIDIA logoLlama 3.1 Nemotron Ultra 253B v115.0
56baidu logoERNIE 4.5 300B A47B 15.0
57NVIDIA logoNVIDIA Nemotron Nano 9B V214.8
58Meta logoLlama 4 Scout13.5
59Cohere logoCommand A13.5
60Alibaba logoQwen3.5 0.8B10.5
61liquid logoLFM2-24B-A2B10.5
62Microsoft logoPhi 410.4
63Microsoft logoPhi 4 Multimodal Instruct10.0
64rekaai logoReka Flash 39.5
65Microsoft logoPhi 4 Mini Instruct8.4
66liquid logoLFM2.5-1.2B-Thinking (free)8.1
67liquid logoLFM2.5-1.2B-Instruct (free)8.0
68ibm-granite logoGranite 4.0 Micro7.7

Pulled from the Artificial Analysis · Quality Index dataset · updated daily

What does Artificial Analysis · Quality Index measure?

Artificial Analysis · Quality Index is a knowledge benchmark in the BenchGecko catalog. 68 AI models have been tested on it. Scores range from 7.7 to 57.2 out of 60.

Which model leads on Artificial Analysis · Quality Index?

Gemini 3.1 Pro Preview from Google DeepMind leads Artificial Analysis · Quality Index with a score of 57.2. The median score across 68 tested models is 32.1.

Is Artificial Analysis · Quality Index saturated?

Yes · the top model on Artificial Analysis · Quality Index has reached 57.2 out of 60, within 5% of the theoretical ceiling. This benchmark is approaching saturation and may be replaced by a harder successor.

Does Artificial Analysis · Quality Index predict performance on other benchmarks?

Yes · Artificial Analysis · Quality Index scores correlate 0.97 with Artificial Analysis · Coding Index across 66 shared models. Models that do well on Artificial Analysis · Quality Index tend to do well on Artificial Analysis · Coding Index.

How often is Artificial Analysis · Quality Index data refreshed?

BenchGecko pulls updates daily. New model scores on Artificial Analysis · Quality Index appear as soon as they are published by Epoch AI or the model provider.

Same category · related evaluations