Terminal Bench
Terminal Bench β tests the ability to accomplish real-world tasks using terminal commands, evaluating shell scripting and CLI tool proficiency.
27
Models Tested
78.4
Top Score
40.3
Average Score
Rankings
| # | Model | Score | Bar |
|---|---|---|---|
| 1 | 78.4 | ||
| 2 | 69.9 | ||
| 3 | 64.9 | ||
| 4 | 64.9 | ||
| 5 | 64.3 | ||
| 6 | 63.1 | ||
| 7 | 61.8 | ||
| 8 | 49.6 | ||
| 9 | 49.6 | ||
| 10 | 47.6 | ||
| 11 | 47.6 | ||
| 12 | 42.8 | ||
| 13 | 38.0 | ||
| 14 | M Kimi K2 Thinkingmoonshotai | 35.7 | |
| 15 | 32.6 | ||
| 16 | 31.9 | ||
| 17 | 29.8 | ||
| 18 | M Kimi K2 0711moonshotai | 27.8 | |
| 19 | 27.2 | ||
| 20 | 25.4 | ||
| 21 | 25.4 | ||
| 22 | ZA GLM 4.6z-ai | 24.5 | |
| 23 | 18.7 | ||
| 24 | 18.7 | ||
| 25 | 18.7 | ||
| 26 | 18.7 | ||
| 27 | 11.5 |