Benchmark · AgentSettled

APEX-Agents

APEX-Agents · evaluates AI agents on complex, multi-step tasks requiring planning, tool use, and autonomous decision-making in realistic environments.

Updated 2026-03-05
Models tested
17
Top score
35.9
GPT-5.4
Median
18.3
min 3.0
Top-5 spread
σ 1.6
Settled

Best score over time · one chart, every benchmark

APEX-AGENTS16 MODELS · FRONTIER RUNNING MAX0255075100SCORE ↑Jul 25Sep 25Nov 25Jan 26Mar 26RELEASE DATE →benchgecko.ai/benchmark/apex-agents · frontier
Frontier on APEX-Agents rose from 15.2 to 35.9 in 8 months · +20.7 points · latest leader GPT-5.4 from OpenAI.
Pink dots = frontier records · 5 totalClick to open model page
Details
Category
Agent
Max score
100
Models
17
Updated
2026-03-05

Same category · related evaluations