OSWorld
OSWorld β tests AI agents on real-world computer tasks across operating systems, including web browsing, file management, and application use.
8
Models Tested
66.3
Top Score
42.0
Average Score
Rankings
| # | Model | Score | Bar |
|---|---|---|---|
| 1 | 66.3 | ||
| 2 | M Kimi K2.5moonshotai | 63.3 | |
| 3 | 62.9 | ||
| 4 | 43.9 | ||
| 5 | 35.8 | ||
| 6 | 35.8 | ||
| 7 | 23.0 | ||
| 8 | 5.0 |