API
Benchmarks/APEX-Agents

APEX-Agents

APEX-Agents β€” evaluates AI agents on complex, multi-step tasks requiring planning, tool use, and autonomous decision-making in realistic environments.

21
Models Tested
35.9
Top Score
17.2
Average Score