Beta
Benchmark · KnowledgeGesättigt

Artificial Analysis · Agentic Index

Artificial Analysis Agentic Index · a composite score measuring how well a model performs in agentic workflows · multi-step tool use, planning, error recovery, and autonomous task completion. Aggregates results from multiple agentic benchmarks including SWE-bench, tool-use tests, and planning evaluations. The canonical single-number metric for "how good is this model as an agent?"

Aktualisiert 2026-04-07

The agentic index shows the fastest frontier progression of any composite metric. Models released 6 months apart show 15-20 point gaps, reflecting rapid improvements in tool use and planning.

Bewertung: Weighted composite of multiple agentic benchmark scores. Components include SWE-bench, tool use, planning, and error recovery. Higher is better.

Getestete Modelle
62
Höchster Score
69.4
GPT-5.4
Median
38.8
Min. 2.7
Top-5-Spanne
σ 2.5
Umkämpft

Best score over time · one chart, every benchmark

ARTIFICIAL ANALYSIS · AGENTIC INDEX60 MODELS · FRONTIER RUNNING MAX015304560SCORE ↑Feb 25Jun 25Sep 25Dec 25Apr 26RELEASE DATE →benchgecko.ai/benchmark/aa-agentic-index · frontier
Frontier on Artificial Analysis · Agentic Index rose from 2.7 to 69.4 in 13 months · +66.7 points · latest leader GPT-5.4 from OpenAI.
Pink dots = frontier records · 10 totalClick to open model page

62 Modelle getestet · nach Score sortiert

#ModellScore
1OpenAI logoGPT-5.469.4
2Anthropic logoClaude Opus 4.6 (Fast)67.6
3z-ai logoGLM 5.167.0
4z-ai logoGLM 5 Turbo63.1
5Anthropic logoClaude Sonnet 4.663.0
6xiaomi logoMiMo-V2-Pro62.8
7OpenAI logoGPT-5.3-Codex62.2
8
U
Muse Spark
62.0
9Alibaba Qwen logoQwen3.6 Plus61.7
10minimax logoMiniMax M2.761.5
11z-ai logoGLM 5V Turbo61.1
12Google DeepMind logoGemini 3.1 Pro Preview59.1
13moonshotai logoKimi K2.558.9
14xiaomi logoMiMo-V2-Omni58.6
15Alibaba Qwen logoQwen3.5 397B A17B55.8
16OpenAI logoGPT-5.4 Mini55.7
17Alibaba Qwen logoQwen3.5-27B54.6
18Alibaba Qwen logoQwen3.5-122B-A10B53.0
19DeepSeek logoDeepSeek V3.252.9
20stepfun logoStep 3.5 Flash52.0
21Alibaba Qwen logoQwen3 Max Thinking50.1
22Google DeepMind logoGemini 3 Flash Preview49.7
23xAI logoGrok 4.1 Fast49.3
24OpenAI logoGPT-5.4 Nano49.3
25xiaomi logoMiMo-V2-Flash48.8
26Google DeepMind logoGemini 3 Pro45.0
27Alibaba Qwen logoQwen3.5-35B-A3B44.1
28arcee-ai logoTrinity Large Thinking42.6
29Alibaba Qwen logoQwen3 Coder Next42.1
30Google DeepMind logoGemma 4 31B (free)40.9
31inception logoMercury 239.7
32OpenAI logogpt-oss-120b (free)37.9
33Alibaba Qwen logoQwen3.5-9B37.4
34OpenAI logoo336.1
35xAI logoGrok Code Fast 135.6
36upstage logoSolar Pro 334.9
37Google DeepMind logoGemini 2.5 Pro32.7
38Alibaba logoQwen3.5 4B32.5
39Google DeepMind logoGemma 4 26B A4B (free)32.1
40OpenAI logogpt-oss-20b (free)27.6
41Google DeepMind logoGemini 3.1 Flash Lite Preview25.7
42Mistral AI logoMistral Medium 3.125.3
43Alibaba Qwen logoQwen3 Next 80B A3B Instruct (free)23.6
44Mistral AI logoMistral Small 423.4
45Alibaba logoQwen3.5 2B23.0
46DeepSeek logoR1 052820.8
47prime-intellect logoINTELLECT-319.8
48Alibaba Qwen logoQwen3 Coder 480B A35B (free)18.3
49Alibaba logoQwen3.5 0.8B15.9
50Alibaba Qwen logoQwen3 Next 80B A3B Instruct14.2
51Google DeepMind logoGemini 2.5 Flash Lite11.7
52NVIDIA logoNVIDIA Nemotron Nano 9B V29.4
53Meta logoLlama 4 Maverick7.2
54
N
Nanbeige4.1 3B
7.2
55liquid logoLFM2.5-1.2B-Thinking (free)6.5
56Meta logoLlama 4 Scout5.2
57Cohere logoCommand A5.1
58ibm-granite logoGranite 4.0 Micro4.2
59NVIDIA logoLlama 3.1 Nemotron Ultra 253B v13.8
60liquid logoLFM2-24B-A2B3.7
61liquid logoLFM2.5-1.2B-Instruct (free)3.6
62Microsoft logoPhi 4 Mini Instruct2.7
Details
Kategorie
Knowledge
Ersteller
Artificial Analysis
Max. Score
60
Modalität
Text
Bewertung
Weighted composite of multiple agentic benchmark scores. Components include SWE-bench, tool use, planning, and error recovery. Higher is better.
Modelle
62
Aktualisiert
2026-04-07
Testet
Multi-step tool usePlanningError recoveryAutonomous task completion
Testet nicht
VisionLong contextKnowledge recallCreative writing
Geckos Einschätzung

The Agentic Index is the single number that matters most for 2026. If you are building agents, this is your shortlist filter. If you are investing, this predicts which provider captures the agent platform market.

Gleiche Kategorie · verwandte Evaluierungen