APEX-Agents
APEX-Agents · evaluates AI agents on complex, multi-step tasks requiring planning, tool use, and autonomous decision-making in realistic environments.
The Frontier
Best score over time · one chart, every benchmark
全ランキング
17 テスト済みモデル · スコア順
| # | モデル | スコア |
|---|---|---|
| 1 | 35.9 | |
| 2 | 34.3 | |
| 3 | 33.5 | |
| 4 | 31.7 | |
| 5 | 31.7 | |
| 6 | 24.0 | |
| 7 | 18.4 | |
| 8 | 18.4 | |
| 9 | 18.3 | |
| 10 | 17.5 | |
| 11 | 15.2 | |
| 12 | 14.4 | |
| 13 | 6.2 | |
| 14 | 4.7 | |
| 15 | 4.0 | |
| 16 | 3.1 | |
| 17 | 3.0 |
スコア分布
モデルが集中する場所
相関ベンチマーク
ピアソンr · 独自調査
Benchmarks that track with APEX-Agents
Pearson correlation across models scored on both benchmarks. Closer to 1 = strongly predictive.
よくある質問
About APEX-Agents
What does APEX-Agents measure?
APEX-Agents · evaluates AI agents on complex, multi-step tasks requiring planning, tool use, and autonomous decision-making in realistic environments. 17 AI models have been tested on it. Scores range from 3.0 to 35.9 out of 100.
Which model leads on APEX-Agents?
GPT-5.4 from OpenAI leads APEX-Agents with a score of 35.9. The median score across 17 tested models is 18.3.
Is APEX-Agents saturated?
No · the top score is 35.9 out of 100 (36%). There is still meaningful room for improvement on APEX-Agents.
Does APEX-Agents predict performance on other benchmarks?
Yes · APEX-Agents scores correlate 0.97 with Artificial Analysis · Coding Index across 6 shared models. Models that do well on APEX-Agents tend to do well on Artificial Analysis · Coding Index.
How often is APEX-Agents data refreshed?
BenchGecko pulls updates daily. New model scores on APEX-Agents appear as soon as they are published by Epoch AI or the model provider.
- カテゴリ
- Agent
- 最高スコア
- 100
- モデル
- 17
- 更新日
- 2026-03-05
その他のagentベンチマーク
同カテゴリ · 関連する評価