Which model leads on SWE-Bench Verified (Bash Only)?

Claude Opus 4.5 from Anthropic leads SWE-Bench Verified (Bash Only) with a score of 74.4. The median score across 19 tested models is 58.4.

Is SWE-Bench Verified (Bash Only) saturated?

No · the top score is 74.4 out of 100 (74%). There is still meaningful room for improvement on SWE-Bench Verified (Bash Only).

Does SWE-Bench Verified (Bash Only) predict performance on other benchmarks?

Yes · SWE-Bench Verified (Bash Only) scores correlate 0.99 with SWE-Bench verified across 11 shared models. Models that do well on SWE-Bench Verified (Bash Only) tend to do well on SWE-Bench verified.

How often is SWE-Bench Verified (Bash Only) data refreshed?

BenchGecko pulls updates daily. New model scores on SWE-Bench Verified (Bash Only) appear as soon as they are published by Epoch AI or the model provider.

Benchmark · CodeSettled

SWE-Bench Verified (Bash Only)

Name: SWE-Bench Verified (Bash Only) Benchmark
Creator: BenchGecko
License: https://creativecommons.org/licenses/by/4.0/

SWE-Bench Verified (Bash Only) · a curated subset of SWE-bench where models fix real Python repository bugs using only bash commands, no agent frameworks.

Updated 2025-12-10

Models tested

Top score

74.4

Claude Opus 4.5

Median

58.4

min 9.1

Top-5 spread

σ 3.0

Competitive

The Frontier

Best score over time · one chart, every benchmark

Chart type

Frontier on SWE-Bench Verified (Bash Only) rose from 21.6 to 74.4 in 12 months · +52.8 points · latest leader Claude Opus 4.5 from Anthropic.

Pink dots = frontier records · 6 totalClick to open model page

Full rankings

19 models tested · sorted by score

#	Model	Score	Price
1	Claude Opus 4.5· Anthropic	74.4	$5.00
2	GPT-5.2· OpenAI	71.8	$1.75
3	Claude Sonnet 4.5· Anthropic	70.6	$3.00
4	Claude Opus 4· Anthropic	67.6	$15.00
5	GPT-5.1· OpenAI	66.0	$1.25
6	GPT-5· OpenAI	65.0	$1.25
7	Claude Sonnet 4· Anthropic	64.9	$3.00
8	Kimi K2 Thinking· moonshotai	63.4	$0.60
9	GPT-5 Mini· OpenAI	59.8	$0.25
10	o3· OpenAI	58.4	$2.00
11	Claude 3.7 Sonnet· Anthropic	52.8	$3.00
12	o4 Mini· OpenAI	45.0	$1.10
13	GPT-4.1· OpenAI	39.6	$2.00
14	GPT-5 Nano· OpenAI	34.8	$0.05
15	gpt-oss-120b· OpenAI	26.0	$0.04
16	GPT-4.1 Mini· OpenAI	23.9	$0.40
17	GPT-4o (2024-11-20)· OpenAI	21.6	$2.50
18	Llama 4 Maverick· Meta	21.0	$0.15
19	Llama 4 Scout· Meta	9.1	$0.08