Beta
Benchmark · AgentSaturo

APEX-Agents

APEX-Agents · evaluates AI agents on complex, multi-step tasks requiring planning, tool use, and autonomous decision-making in realistic environments.

Aggiornato 2026-03-05
Modelli testati
17
Punteggio massimo
35.9
GPT-5.4
Mediana
18.3
min 3.0
Divario top 5
σ 1.6
Saturo

Best score over time · one chart, every benchmark

APEX-AGENTS16 MODELS · FRONTIER RUNNING MAX0255075100SCORE ↑Jul 25Sep 25Nov 25Jan 26Mar 26RELEASE DATE →benchgecko.ai/benchmark/apex-agents · frontier
Frontier on APEX-Agents rose from 15.2 to 35.9 in 8 months · +20.7 points · latest leader GPT-5.4 from OpenAI.
Pink dots = frontier records · 5 totalClick to open model page

17 modelli testati · ordinati per punteggio

Dettagli
Categoria
Agent
Punteggio massimo
100
Modelli
17
Aggiornato
2026-03-05

Stessa categoria · valutazioni correlate