What does SWE-bench Pro measure?

SWE-bench Pro is a knowledge benchmark in the BenchGecko catalog. 1 AI models have been tested on it. Scores range from 77.8 to 77.8 out of 100.

Which model leads on SWE-bench Pro?

Claude Mythos Preview from Anthropic leads SWE-bench Pro with a score of 77.8. The median score across 1 tested models is 77.8.

Is SWE-bench Pro saturated?

No · the top score is 77.8 out of 100 (78%). There is still meaningful room for improvement on SWE-bench Pro.

What makes SWE-bench Pro distinctive?

SWE-bench Pro is a knowledge benchmark with limited overlap to the rest of the catalog · it measures capabilities that are not well-covered by other benchmarks we track.

How often is SWE-bench Pro data refreshed?

BenchGecko pulls updates daily. New model scores on SWE-bench Pro appear as soon as they are published by Epoch AI or the model provider.

Benchmark · KnowledgeSettled

SWE-bench Pro

Name: SWE-bench Pro Benchmark
Creator: BenchGecko
License: https://creativecommons.org/licenses/by/4.0/

Updated 2026-04-07

Models tested

Top score

77.8

Claude Mythos Preview

Median

77.8

min 77.8

Top-5 spread

σ 0.0

Settled

Full rankings

3 models tested · sorted by score · includes 3 verified scores

#	Model	Score	Price	Source
1	Claude Mythos Preview· Anthropic	77.8	—	Anthropic Mythos System Card, Apr 2026
2	Claude Opus 4.7· Anthropicverified	64.3	—	Anthropic Opus 4.7 Announcement, Apr 2026
3	Claude Opus 4.6· Anthropicverified	53.4	—	Anthropic Opus 4.6 System Card, Feb 2026