Which model leads on GSO-Bench?

Claude Opus 4.6 from Anthropic leads GSO-Bench with a score of 33.3. The median score across 18 tested models is 6.9.

Is GSO-Bench saturated?

No · the top score is 33.3 out of 100 (33%). There is still meaningful room for improvement on GSO-Bench.

Does GSO-Bench predict performance on other benchmarks?

Yes · GSO-Bench scores correlate 0.98 with Cybench across 9 shared models. Models that do well on GSO-Bench tend to do well on Cybench.

How often is GSO-Bench data refreshed?

BenchGecko pulls updates daily. New model scores on GSO-Bench appear as soon as they are published by Epoch AI or the model provider.

Benchmark · CodeCompetitive

GSO-Bench

Name: GSO-Bench Benchmark
Creator: BenchGecko
License: https://creativecommons.org/licenses/by/4.0/

GSO-Bench · evaluates AI models on real-world open-source software engineering tasks, testing the ability to understand and resolve actual GitHub issues.

Updated 2026-02-04

Models tested

Top score

33.3

Claude Opus 4.6

Median

6.9

min 0.1

Top-5 spread

σ 6.6

wide open

The Frontier

Best score over time · one chart, every benchmark

Chart type

Frontier on GSO-Bench rose from 0.1 to 33.3 in 15 months · +33.2 points · latest leader Claude Opus 4.6 from Anthropic.

Pink dots = frontier records · 8 totalClick to open model page

Full rankings

18 models tested · sorted by score

#	Model	Score	Price
1	Claude Opus 4.6· Anthropic	33.3	$5.00
2	GPT-5.2· OpenAI	27.4	$1.75
3	Claude Opus 4.5· Anthropic	26.5	$5.00
4	Gemini 3 Pro· Google DeepMind	18.6	—
5	Claude Sonnet 4.5· Anthropic	14.7	$3.00
6	GPT-5.1· OpenAI	13.7	$1.25
7	Gemini 3 Flash Preview· Google DeepMind	9.8	$0.50
8	o3· OpenAI	8.8	$2.00
9	Claude Opus 4· Anthropic	6.9	$15.00
10	GPT-5· OpenAI	6.9	$1.25
11	Claude Sonnet 4· Anthropic	4.9	$3.00
12	Kimi K2 0711· moonshotai	4.9	$0.57
13	Claude 3.5 Sonnet· Anthropic	4.6	—
14	Gemini 2.5 Pro· Google DeepMind	3.9	$1.25
15	Claude 3.7 Sonnet· Anthropic	3.8	$3.00
16	o4 Mini· OpenAI	3.6	$1.10
17	o3 Mini· OpenAI	1.3	$1.10
18	GPT-4o (2024-11-20)· OpenAI	0.1	$2.50