GSO-Bench
GSO-Bench β evaluates AI models on real-world open-source software engineering tasks, testing the ability to understand and resolve actual GitHub issues.
23
Models Tested
33.3
Top Score
10.9
Average Score
Rankings
| # | Model | Score | Bar |
|---|---|---|---|
| 1 | 33.3 | ||
| 2 | 27.4 | ||
| 3 | 27.4 | ||
| 4 | 26.5 | ||
| 5 | 18.6 | ||
| 6 | 14.7 | ||
| 7 | 13.7 | ||
| 8 | 13.7 | ||
| 9 | 9.8 | ||
| 10 | 8.8 | ||
| 11 | 6.9 | ||
| 12 | 6.9 | ||
| 13 | 6.9 | ||
| 14 | 4.9 | ||
| 15 | 4.9 | ||
| 16 | M Kimi K2 0711moonshotai | 4.9 | |
| 17 | 4.9 | ||
| 18 | 3.9 | ||
| 19 | 3.8 | ||
| 20 | 3.8 | ||
| 21 | 3.6 | ||
| 22 | 1.3 | ||
| 23 | 0.1 |