Artificial Analysis · Agentic Index
Artificial Analysis Agentic Index · a composite score measuring how well a model performs in agentic workflows · multi-step tool use, planning, error recovery, and autonomous task completion. Aggregates results from multiple agentic benchmarks including SWE-bench, tool-use tests, and planning evaluations. The canonical single-number metric for "how good is this model as an agent?"
The agentic index shows the fastest frontier progression of any composite metric. Models released 6 months apart show 15-20 point gaps, reflecting rapid improvements in tool use and planning.
评分方式: Weighted composite of multiple agentic benchmark scores. Components include SWE-bench, tool use, planning, and error recovery. Higher is better.
The Frontier
Best score over time · one chart, every benchmark
完整排名
62 已测试模型 · 按分数排序
分数分布
模型聚集位置
关联基准测试
Pearson r · 原创研究
Benchmarks that track with Artificial Analysis · Agentic Index
Pearson correlation across models scored on both benchmarks. Closer to 1 = strongly predictive.
工作原理
评测方法
The AA Agentic Index is a composite score from Artificial Analysis that aggregates performance across multiple agentic benchmarks · SWE-bench, tool use evaluations, multi-step planning tasks, and error recovery scenarios. The index captures how well a model performs when deployed as an autonomous agent rather than a one-shot responder. Components include code-writing agents, web browsing agents, and multi-tool orchestration tasks.
行业相关性
为何团队追踪此基准测试
As AI deployment shifts from chatbots to agents, the Agentic Index becomes the most commercially relevant composite metric. It predicts which models will power the next generation of autonomous coding assistants, research agents, and workflow automation tools.
实践要点
按角色
The Agentic Index is your first filter for model selection in any agent pipeline. Shortlist the top 3, then evaluate on your specific tool chain.
Agentic capability is the highest-margin AI product category. Models leading this index attract the most enterprise agent deployments.
Track the component breakdown over time. The fastest-improving component reveals where architectures are innovating.
常见问题
About Artificial Analysis · Agentic Index
What does Artificial Analysis · Agentic Index measure?
Artificial Analysis Agentic Index · a composite score measuring how well a model performs in agentic workflows · multi-step tool use, planning, error recovery, and autonomous task completion. Aggregates results from multiple agentic benchmarks including SWE-bench, tool-use tests, and planning evaluations. The canonical single-number metric for "how good is this model as an agent?" 62 AI models have been tested on it. Scores range from 2.7 to 69.4 out of 60.
Which model leads on Artificial Analysis · Agentic Index?
GPT-5.4 from OpenAI leads Artificial Analysis · Agentic Index with a score of 69.4. The median score across 62 tested models is 38.8.
Is Artificial Analysis · Agentic Index saturated?
Yes · the top model on Artificial Analysis · Agentic Index has reached 69.4 out of 60, within 5% of the theoretical ceiling. This benchmark is approaching saturation and may be replaced by a harder successor.
Does Artificial Analysis · Agentic Index predict performance on other benchmarks?
Yes · Artificial Analysis · Agentic Index scores correlate 0.96 with GeoBench across 5 shared models. Models that do well on Artificial Analysis · Agentic Index tend to do well on GeoBench.
How often is Artificial Analysis · Agentic Index data refreshed?
BenchGecko pulls updates daily. New model scores on Artificial Analysis · Agentic Index appear as soon as they are published by Epoch AI or the model provider.
- 类别
- Knowledge
- 创建者
- Artificial Analysis
- 最高分
- 60
- 模态
- Text
- 评分方式
- Weighted composite of multiple agentic benchmark scores. Components include SWE-bench, tool use, planning, and error recovery. Higher is better.
- 模型
- 62
- 已更新
- 2026-04-07
“The Agentic Index is the single number that matters most for 2026. If you are building agents, this is your shortlist filter. If you are investing, this predicts which provider captures the agent platform market.”
Top on Artificial Analysis · Agentic Index
GPT-5.4 · 69.4Claude Opus 4.6 (Fast) · 67.6GLM 5.1 · 67.0GLM 5 Turbo · 63.1Claude Sonnet 4.6 · 63.0更多 knowledge 基准测试
同类别 · 相关评测