Beta
Benchmark · KnowledgeSaturo

Artificial Analysis · Agentic Index

Artificial Analysis Agentic Index · a composite score measuring how well a model performs in agentic workflows · multi-step tool use, planning, error recovery, and autonomous task completion. Aggregates results from multiple agentic benchmarks including SWE-bench, tool-use tests, and planning evaluations. The canonical single-number metric for "how good is this model as an agent?"

Aggiornato 2026-04-07

The agentic index shows the fastest frontier progression of any composite metric. Models released 6 months apart show 15-20 point gaps, reflecting rapid improvements in tool use and planning.

Punteggio: Weighted composite of multiple agentic benchmark scores. Components include SWE-bench, tool use, planning, and error recovery. Higher is better.

Modelli testati
62
Punteggio massimo
69.4
GPT-5.4
Mediana
38.8
min 2.7
Divario top 5
σ 2.5
Conteso

Best score over time · one chart, every benchmark

ARTIFICIAL ANALYSIS · AGENTIC INDEX60 MODELS · FRONTIER RUNNING MAX015304560SCORE ↑Feb 25Jun 25Sep 25Dec 25Apr 26RELEASE DATE →benchgecko.ai/benchmark/aa-agentic-index · frontier
Frontier on Artificial Analysis · Agentic Index rose from 2.7 to 69.4 in 13 months · +66.7 points · latest leader GPT-5.4 from OpenAI.
Pink dots = frontier records · 10 totalClick to open model page

62 modelli testati · ordinati per punteggio

#ModelloPunteggio
1OpenAI logoGPT-5.469.4
2Anthropic logoClaude Opus 4.6 (Fast)67.6
3z-ai logoGLM 5.167.0
4z-ai logoGLM 5 Turbo63.1
5Anthropic logoClaude Sonnet 4.663.0
6xiaomi logoMiMo-V2-Pro62.8
7OpenAI logoGPT-5.3-Codex62.2
8
U
Muse Spark
62.0
9Alibaba Qwen logoQwen3.6 Plus61.7
10minimax logoMiniMax M2.761.5
11z-ai logoGLM 5V Turbo61.1
12Google DeepMind logoGemini 3.1 Pro Preview59.1
13moonshotai logoKimi K2.558.9
14xiaomi logoMiMo-V2-Omni58.6
15Alibaba Qwen logoQwen3.5 397B A17B55.8
16OpenAI logoGPT-5.4 Mini55.7
17Alibaba Qwen logoQwen3.5-27B54.6
18Alibaba Qwen logoQwen3.5-122B-A10B53.0
19DeepSeek logoDeepSeek V3.252.9
20stepfun logoStep 3.5 Flash52.0
21Alibaba Qwen logoQwen3 Max Thinking50.1
22Google DeepMind logoGemini 3 Flash Preview49.7
23xAI logoGrok 4.1 Fast49.3
24OpenAI logoGPT-5.4 Nano49.3
25xiaomi logoMiMo-V2-Flash48.8
26Google DeepMind logoGemini 3 Pro45.0
27Alibaba Qwen logoQwen3.5-35B-A3B44.1
28arcee-ai logoTrinity Large Thinking42.6
29Alibaba Qwen logoQwen3 Coder Next42.1
30Google DeepMind logoGemma 4 31B (free)40.9
31inception logoMercury 239.7
32OpenAI logogpt-oss-120b (free)37.9
33Alibaba Qwen logoQwen3.5-9B37.4
34OpenAI logoo336.1
35xAI logoGrok Code Fast 135.6
36upstage logoSolar Pro 334.9
37Google DeepMind logoGemini 2.5 Pro32.7
38Alibaba logoQwen3.5 4B32.5
39Google DeepMind logoGemma 4 26B A4B (free)32.1
40OpenAI logogpt-oss-20b (free)27.6
41Google DeepMind logoGemini 3.1 Flash Lite Preview25.7
42Mistral AI logoMistral Medium 3.125.3
43Alibaba Qwen logoQwen3 Next 80B A3B Instruct (free)23.6
44Mistral AI logoMistral Small 423.4
45Alibaba logoQwen3.5 2B23.0
46DeepSeek logoR1 052820.8
47prime-intellect logoINTELLECT-319.8
48Alibaba Qwen logoQwen3 Coder 480B A35B (free)18.3
49Alibaba logoQwen3.5 0.8B15.9
50Alibaba Qwen logoQwen3 Next 80B A3B Instruct14.2
51Google DeepMind logoGemini 2.5 Flash Lite11.7
52NVIDIA logoNVIDIA Nemotron Nano 9B V29.4
53Meta logoLlama 4 Maverick7.2
54
N
Nanbeige4.1 3B
7.2
55liquid logoLFM2.5-1.2B-Thinking (free)6.5
56Meta logoLlama 4 Scout5.2
57Cohere logoCommand A5.1
58ibm-granite logoGranite 4.0 Micro4.2
59NVIDIA logoLlama 3.1 Nemotron Ultra 253B v13.8
60liquid logoLFM2-24B-A2B3.7
61liquid logoLFM2.5-1.2B-Instruct (free)3.6
62Microsoft logoPhi 4 Mini Instruct2.7
Dettagli
Categoria
Knowledge
Creatore
Artificial Analysis
Punteggio massimo
60
Modalità
Text
Punteggio
Weighted composite of multiple agentic benchmark scores. Components include SWE-bench, tool use, planning, and error recovery. Higher is better.
Modelli
62
Aggiornato
2026-04-07
Misura
Multi-step tool usePlanningError recoveryAutonomous task completion
Non misura
VisionLong contextKnowledge recallCreative writing
Il punto del Gecko

The Agentic Index is the single number that matters most for 2026. If you are building agents, this is your shortlist filter. If you are investing, this predicts which provider captures the agent platform market.

Stessa categoria · valutazioni correlate