Beta
Benchmark · Knowledge

Artificial Analysis · Agentic Index

Updated 2026-04-07
Models tested
62
Top score
69.4
GPT-5.4
Median
38.8
min 2.7
Top-5 spread
σ 2.5
competitive

Best score over time · one chart, every benchmark

ARTIFICIAL ANALYSIS · AGENTIC INDEX60 MODELS · FRONTIER RUNNING MAX015304560SCORE ↑Feb 25Jun 25Sep 25Dec 25Apr 26RELEASE DATE →benchgecko.ai/benchmark/aa-agentic-index · frontier
Frontier on Artificial Analysis · Agentic Index rose from 2.7 to 69.4 in 13 months · +66.7 points · latest leader GPT-5.4 from OpenAI.
Pink dots = frontier records · 10 totalClick to open model page

Where models cluster

SCORE DISTRIBUTION70–656–12212–18618–24324–30530–36536–42442–48848–541754–60MEDIAN · 38.8SCORE BUCKET → (0 TO 60)MODELSbenchgecko.ai

Pearson r · original research

62 models tested · sorted by score

#ModelScore
1OpenAI logoGPT-5.469.4
2Anthropic logoClaude Opus 4.6 (Fast)67.6
3z-ai logoGLM 5.167.0
4z-ai logoGLM 5 Turbo63.1
5Anthropic logoClaude Sonnet 4.663.0
6xiaomi logoMiMo-V2-Pro62.8
7OpenAI logoGPT-5.3-Codex62.2
8
U
Muse Spark
62.0
9Alibaba Qwen logoQwen3.6 Plus61.7
10minimax logoMiniMax M2.761.5
11z-ai logoGLM 5V Turbo61.1
12Google DeepMind logoGemini 3.1 Pro Preview59.1
13moonshotai logoKimi K2.558.9
14xiaomi logoMiMo-V2-Omni58.6
15Alibaba Qwen logoQwen3.5 397B A17B55.8
16OpenAI logoGPT-5.4 Mini55.7
17Alibaba Qwen logoQwen3.5-27B54.6
18Alibaba Qwen logoQwen3.5-122B-A10B53.0
19DeepSeek logoDeepSeek V3.252.9
20stepfun logoStep 3.5 Flash52.0
21Alibaba Qwen logoQwen3 Max Thinking50.1
22Google DeepMind logoGemini 3 Flash Preview49.7
23xAI logoGrok 4.1 Fast49.3
24OpenAI logoGPT-5.4 Nano49.3
25xiaomi logoMiMo-V2-Flash48.8
26Google DeepMind logoGemini 3 Pro45.0
27Alibaba Qwen logoQwen3.5-35B-A3B44.1
28arcee-ai logoTrinity Large Thinking42.6
29Alibaba Qwen logoQwen3 Coder Next42.1
30Google DeepMind logoGemma 4 31B (free)40.9
31inception logoMercury 239.7
32OpenAI logogpt-oss-120b (free)37.9
33Alibaba Qwen logoQwen3.5-9B37.4
34OpenAI logoo336.1
35xAI logoGrok Code Fast 135.6
36upstage logoSolar Pro 334.9
37Google DeepMind logoGemini 2.5 Pro32.7
38Alibaba logoQwen3.5 4B32.5
39Google DeepMind logoGemma 4 26B A4B (free)32.1
40OpenAI logogpt-oss-20b (free)27.6
41Google DeepMind logoGemini 3.1 Flash Lite Preview25.7
42Mistral AI logoMistral Medium 3.125.3
43Alibaba Qwen logoQwen3 Next 80B A3B Instruct (free)23.6
44Mistral AI logoMistral Small 423.4
45Alibaba logoQwen3.5 2B23.0
46DeepSeek logoR1 052820.8
47prime-intellect logoINTELLECT-319.8
48Alibaba Qwen logoQwen3 Coder 480B A35B (free)18.3
49Alibaba logoQwen3.5 0.8B15.9
50Alibaba Qwen logoQwen3 Next 80B A3B Instruct14.2
51Google DeepMind logoGemini 2.5 Flash Lite11.7
52NVIDIA logoNVIDIA Nemotron Nano 9B V29.4
53Meta logoLlama 4 Maverick7.2
54
N
Nanbeige4.1 3B
7.2
55liquid logoLFM2.5-1.2B-Thinking (free)6.5
56Meta logoLlama 4 Scout5.2
57Cohere logoCommand A5.1
58ibm-granite logoGranite 4.0 Micro4.2
59NVIDIA logoLlama 3.1 Nemotron Ultra 253B v13.8
60liquid logoLFM2-24B-A2B3.7
61liquid logoLFM2.5-1.2B-Instruct (free)3.6
62Microsoft logoPhi 4 Mini Instruct2.7

Pulled from the Artificial Analysis · Agentic Index dataset · updated daily

What does Artificial Analysis · Agentic Index measure?

Artificial Analysis · Agentic Index is a knowledge benchmark in the BenchGecko catalog. 62 AI models have been tested on it. Scores range from 2.7 to 69.4 out of 60.

Which model leads on Artificial Analysis · Agentic Index?

GPT-5.4 from OpenAI leads Artificial Analysis · Agentic Index with a score of 69.4. The median score across 62 tested models is 38.8.

Is Artificial Analysis · Agentic Index saturated?

Yes · the top model on Artificial Analysis · Agentic Index has reached 69.4 out of 60, within 5% of the theoretical ceiling. This benchmark is approaching saturation and may be replaced by a harder successor.

Does Artificial Analysis · Agentic Index predict performance on other benchmarks?

Yes · Artificial Analysis · Agentic Index scores correlate 0.96 with GeoBench across 5 shared models. Models that do well on Artificial Analysis · Agentic Index tend to do well on GeoBench.

How often is Artificial Analysis · Agentic Index data refreshed?

BenchGecko pulls updates daily. New model scores on Artificial Analysis · Agentic Index appear as soon as they are published by Epoch AI or the model provider.

Same category · related evaluations