API
Benchmarks/SWE-Bench Verified (Bash Only)

SWE-Bench Verified (Bash Only)

SWE-Bench Verified (Bash Only) β€” a curated subset of SWE-bench where models fix real Python repository bugs using only bash commands, no agent frameworks.

32
Models Tested
74.4
Top Score
49.4
Average Score