测试版
基准测试 · Agent已尘埃落定

APEX-Agents

APEX-Agents · evaluates AI agents on complex, multi-step tasks requiring planning, tool use, and autonomous decision-making in realistic environments.

已更新 2026-03-05
已测试模型
17
最高分
35.9
GPT-5.4
中位数
18.3
最低 3.0
前 5 名差距
σ 1.6
已尘埃落定

Best score over time · one chart, every benchmark

APEX-AGENTS16 MODELS · FRONTIER RUNNING MAX0255075100SCORE ↑Jul 25Sep 25Nov 25Jan 26Mar 26RELEASE DATE →benchgecko.ai/benchmark/apex-agents · frontier
Frontier on APEX-Agents rose from 15.2 to 35.9 in 8 months · +20.7 points · latest leader GPT-5.4 from OpenAI.
Pink dots = frontier records · 5 totalClick to open model page
详情
类别
Agent
最高分
100
模型
17
已更新
2026-03-05

同类别 · 相关评测