ベータ
ベンチマーク · Knowledge確定的

Artificial Analysis · Agentic Index

Artificial Analysis Agentic Index · a composite score measuring how well a model performs in agentic workflows · multi-step tool use, planning, error recovery, and autonomous task completion. Aggregates results from multiple agentic benchmarks including SWE-bench, tool-use tests, and planning evaluations. The canonical single-number metric for "how good is this model as an agent?"

更新日 2026-04-07

The agentic index shows the fastest frontier progression of any composite metric. Models released 6 months apart show 15-20 point gaps, reflecting rapid improvements in tool use and planning.

スコアリング: Weighted composite of multiple agentic benchmark scores. Components include SWE-bench, tool use, planning, and error recovery. Higher is better.

テスト済みモデル数
62
トップスコア
69.4
GPT-5.4
中央値
38.8
分 2.7
トップ5スプレッド
σ 2.5
競争的

Best score over time · one chart, every benchmark

ARTIFICIAL ANALYSIS · AGENTIC INDEX60 MODELS · FRONTIER RUNNING MAX015304560SCORE ↑Feb 25Jun 25Sep 25Dec 25Apr 26RELEASE DATE →benchgecko.ai/benchmark/aa-agentic-index · frontier
Frontier on Artificial Analysis · Agentic Index rose from 2.7 to 69.4 in 13 months · +66.7 points · latest leader GPT-5.4 from OpenAI.
Pink dots = frontier records · 10 totalClick to open model page

62 テスト済みモデル · スコア順

#モデルスコア
1OpenAI logoGPT-5.469.4
2Anthropic logoClaude Opus 4.6 (Fast)67.6
3z-ai logoGLM 5.167.0
4z-ai logoGLM 5 Turbo63.1
5Anthropic logoClaude Sonnet 4.663.0
6xiaomi logoMiMo-V2-Pro62.8
7OpenAI logoGPT-5.3-Codex62.2
8
U
Muse Spark
62.0
9Alibaba Qwen logoQwen3.6 Plus61.7
10minimax logoMiniMax M2.761.5
11z-ai logoGLM 5V Turbo61.1
12Google DeepMind logoGemini 3.1 Pro Preview59.1
13moonshotai logoKimi K2.558.9
14xiaomi logoMiMo-V2-Omni58.6
15Alibaba Qwen logoQwen3.5 397B A17B55.8
16OpenAI logoGPT-5.4 Mini55.7
17Alibaba Qwen logoQwen3.5-27B54.6
18Alibaba Qwen logoQwen3.5-122B-A10B53.0
19DeepSeek logoDeepSeek V3.252.9
20stepfun logoStep 3.5 Flash52.0
21Alibaba Qwen logoQwen3 Max Thinking50.1
22Google DeepMind logoGemini 3 Flash Preview49.7
23xAI logoGrok 4.1 Fast49.3
24OpenAI logoGPT-5.4 Nano49.3
25xiaomi logoMiMo-V2-Flash48.8
26Google DeepMind logoGemini 3 Pro45.0
27Alibaba Qwen logoQwen3.5-35B-A3B44.1
28arcee-ai logoTrinity Large Thinking42.6
29Alibaba Qwen logoQwen3 Coder Next42.1
30Google DeepMind logoGemma 4 31B (free)40.9
31inception logoMercury 239.7
32OpenAI logogpt-oss-120b (free)37.9
33Alibaba Qwen logoQwen3.5-9B37.4
34OpenAI logoo336.1
35xAI logoGrok Code Fast 135.6
36upstage logoSolar Pro 334.9
37Google DeepMind logoGemini 2.5 Pro32.7
38Alibaba logoQwen3.5 4B32.5
39Google DeepMind logoGemma 4 26B A4B (free)32.1
40OpenAI logogpt-oss-20b (free)27.6
41Google DeepMind logoGemini 3.1 Flash Lite Preview25.7
42Mistral AI logoMistral Medium 3.125.3
43Alibaba Qwen logoQwen3 Next 80B A3B Instruct (free)23.6
44Mistral AI logoMistral Small 423.4
45Alibaba logoQwen3.5 2B23.0
46DeepSeek logoR1 052820.8
47prime-intellect logoINTELLECT-319.8
48Alibaba Qwen logoQwen3 Coder 480B A35B (free)18.3
49Alibaba logoQwen3.5 0.8B15.9
50Alibaba Qwen logoQwen3 Next 80B A3B Instruct14.2
51Google DeepMind logoGemini 2.5 Flash Lite11.7
52NVIDIA logoNVIDIA Nemotron Nano 9B V29.4
53Meta logoLlama 4 Maverick7.2
54
N
Nanbeige4.1 3B
7.2
55liquid logoLFM2.5-1.2B-Thinking (free)6.5
56Meta logoLlama 4 Scout5.2
57Cohere logoCommand A5.1
58ibm-granite logoGranite 4.0 Micro4.2
59NVIDIA logoLlama 3.1 Nemotron Ultra 253B v13.8
60liquid logoLFM2-24B-A2B3.7
61liquid logoLFM2.5-1.2B-Instruct (free)3.6
62Microsoft logoPhi 4 Mini Instruct2.7
詳細
カテゴリ
Knowledge
作成者
Artificial Analysis
最高スコア
60
モダリティ
Text
スコアリング
Weighted composite of multiple agentic benchmark scores. Components include SWE-bench, tool use, planning, and error recovery. Higher is better.
モデル
62
更新日
2026-04-07
テスト内容
Multi-step tool usePlanningError recoveryAutonomous task completion
テストしない内容
VisionLong contextKnowledge recallCreative writing
Geckoの見解

The Agentic Index is the single number that matters most for 2026. If you are building agents, this is your shortlist filter. If you are investing, this predicts which provider captures the agent platform market.

同カテゴリ · 関連する評価