APEX-Agents
APEX-Agents β evaluates AI agents on complex, multi-step tasks requiring planning, tool use, and autonomous decision-making in realistic environments.
21
Models Tested
35.9
Top Score
17.2
Average Score
Rankings
| # | Model | Score | Bar |
|---|---|---|---|
| 1 | 35.9 | ||
| 2 | 34.3 | ||
| 3 | 34.3 | ||
| 4 | 33.5 | ||
| 5 | 31.7 | ||
| 6 | 24.0 | ||
| 7 | 18.4 | ||
| 8 | 18.4 | ||
| 9 | 18.3 | ||
| 10 | 18.3 | ||
| 11 | 17.5 | ||
| 12 | 17.5 | ||
| 13 | 15.2 | ||
| 14 | M Kimi K2.5moonshotai | 14.4 | |
| 15 | 4.7 | ||
| 16 | 4.7 | ||
| 17 | 4.7 | ||
| 18 | 4.7 | ||
| 19 | M Kimi K2 Thinkingmoonshotai | 4.0 | |
| 20 | ZA GLM 4.7z-ai | 3.1 | |
| 21 | ZA GLM 4.6z-ai | 3.0 |