API
Benchmarks/OSWorld

OSWorld

OSWorld β€” tests AI agents on real-world computer tasks across operating systems, including web browsing, file management, and application use.

8
Models Tested
66.3
Top Score
42.0
Average Score