API
Benchmarks/GSO-Bench

GSO-Bench

GSO-Bench β€” evaluates AI models on real-world open-source software engineering tasks, testing the ability to understand and resolve actual GitHub issues.

23
Models Tested
33.3
Top Score
10.9
Average Score