Beta
Benchmark · Knowledge

Artificial Analysis · Coding Index

Updated 2026-04-07
Models tested
66
Top score
57.3
GPT-5.4
Median
29.4
min 0.8
Top-5 spread
σ 2.4
competitive

Best score over time · one chart, every benchmark

ARTIFICIAL ANALYSIS · CODING INDEX64 MODELS · FRONTIER RUNNING MAX015304560SCORE ↑Jan 25May 25Aug 25Dec 25Apr 26RELEASE DATE →benchgecko.ai/benchmark/aa-coding-index · frontier
Frontier on Artificial Analysis · Coding Index rose from 11.2 to 57.3 in 14 months · +46.0 points · latest leader GPT-5.4 from OpenAI.
Pink dots = frontier records · 8 totalClick to open model page

Where models cluster

SCORE DISTRIBUTION60–676–12612–18818–24624–301130–361036–42642–48448–54254–60MEDIAN · 29.4SCORE BUCKET → (0 TO 60)MODELSbenchgecko.ai

Pearson r · original research

Correlation analysis

Benchmarks that track with Artificial Analysis · Coding Index

Pearson correlation across models scored on both benchmarks. Closer to 1 = strongly predictive.

66 models tested · sorted by score

#ModelScore
1OpenAI logoGPT-5.457.3
2Google DeepMind logoGemini 3.1 Pro Preview55.5
3OpenAI logoGPT-5.3-Codex53.1
4OpenAI logoGPT-5.4 Mini51.5
5Anthropic logoClaude Sonnet 4.650.9
6Anthropic logoClaude Opus 4.6 (Fast)48.1
7
U
Muse Spark
47.5
8z-ai logoGLM 5 Turbo44.2
9OpenAI logoGPT-5.4 Nano43.9
10z-ai logoGLM 5.143.4
11Alibaba Qwen logoQwen3.6 Plus42.9
12Google DeepMind logoGemini 3 Flash Preview42.6
13minimax logoMiniMax M2.741.9
14xiaomi logoMiMo-V2-Pro41.4
15Alibaba Qwen logoQwen3.5 397B A17B41.3
16moonshotai logoKimi K2.539.5
17Google DeepMind logoGemini 3 Pro39.4
18Google DeepMind logoGemma 4 31B (free)38.7
19OpenAI logoo338.4
20DeepSeek logoDeepSeek V3.2 Speciale37.9
21DeepSeek logoDeepSeek V3.236.7
22z-ai logoGLM 5V Turbo36.2
23xiaomi logoMiMo-V2-Omni35.5
24Alibaba Qwen logoQwen3.5-27B34.9
25Alibaba Qwen logoQwen3.5-122B-A10B34.7
26xiaomi logoMiMo-V2-Flash33.5
27Google DeepMind logoGemini 2.5 Pro31.9
28stepfun logoStep 3.5 Flash31.6
29xAI logoGrok 4.1 Fast30.9
30inception logoMercury 230.6
31Alibaba Qwen logoQwen3 Max Thinking30.5
32Alibaba Qwen logoQwen3.5-35B-A3B30.3
33Google DeepMind logoGemini 3.1 Flash Lite Preview30.1
34OpenAI logogpt-oss-120b (free)28.6
35arcee-ai logoTrinity Large Thinking27.2
36Alibaba Qwen logoQwen3.5-9B25.3
37Alibaba Qwen logoQwen3 Coder 480B A35B (free)24.6
38Mistral AI logoMistral Small 424.3
39DeepSeek logoR1 052824.0
40xAI logoGrok Code Fast 123.7
41Alibaba Qwen logoQwen3 Coder Next22.9
42Google DeepMind logoGemma 4 26B A4B (free)22.4
43Alibaba Qwen logoQwen3 Next 80B A3B Instruct (free)19.5
44prime-intellect logoINTELLECT-319.1
45OpenAI logogpt-oss-20b (free)18.5
46Mistral AI logoMistral Medium 3.118.3
47Google DeepMind logoGemini 2.5 Flash Lite18.1
48Alibaba logoQwen3.5 4B17.5
49Meta logoLlama 4 Maverick15.6
50Alibaba Qwen logoQwen3 Next 80B A3B Instruct15.3
51baidu logoERNIE 4.5 300B A47B 14.5
52upstage logoSolar Pro 313.3
53NVIDIA logoLlama 3.1 Nemotron Ultra 253B v113.1
54DeepSeek logoR1 Distill Llama 70B11.4
55Microsoft logoPhi 411.2
56Cohere logoCommand A9.9
57rekaai logoReka Flash 38.9
58
N
Nanbeige4.1 3B
8.9
59NVIDIA logoNVIDIA Nemotron Nano 9B V28.3
60Meta logoLlama 4 Scout6.7
61ibm-granite logoGranite 4.0 Micro5.0
62liquid logoLFM2-24B-A2B3.6
63Microsoft logoPhi 4 Mini Instruct3.6
64Alibaba logoQwen3.5 2B3.5
65liquid logoLFM2.5-1.2B-Thinking (free)1.4
66liquid logoLFM2.5-1.2B-Instruct (free)0.8

Pulled from the Artificial Analysis · Coding Index dataset · updated daily

What does Artificial Analysis · Coding Index measure?

Artificial Analysis · Coding Index is a knowledge benchmark in the BenchGecko catalog. 66 AI models have been tested on it. Scores range from 0.8 to 57.3 out of 60.

Which model leads on Artificial Analysis · Coding Index?

GPT-5.4 from OpenAI leads Artificial Analysis · Coding Index with a score of 57.3. The median score across 66 tested models is 29.4.

Is Artificial Analysis · Coding Index saturated?

Yes · the top model on Artificial Analysis · Coding Index has reached 57.3 out of 60, within 5% of the theoretical ceiling. This benchmark is approaching saturation and may be replaced by a harder successor.

Does Artificial Analysis · Coding Index predict performance on other benchmarks?

Yes · Artificial Analysis · Coding Index scores correlate 0.98 with OpenCompass · HLE across 11 shared models. Models that do well on Artificial Analysis · Coding Index tend to do well on OpenCompass · HLE.

How often is Artificial Analysis · Coding Index data refreshed?

BenchGecko pulls updates daily. New model scores on Artificial Analysis · Coding Index appear as soon as they are published by Epoch AI or the model provider.

Same category · related evaluations