SWE Atlas · Codebase QnA
Distribution
Where models cluster
Correlated benchmarks
Pearson r · original research
Full rankings
3 models tested · sorted by score
| # | Model | Score |
|---|---|---|
| 1 | 33.3 | |
| 2 | 32.6 | |
| 3 | 31.2 |
Frequently asked
Pulled from the SWE Atlas · Codebase QnA dataset · updated daily
What does SWE Atlas · Codebase QnA measure?
SWE Atlas · Codebase QnA is a knowledge benchmark in the BenchGecko catalog. 3 AI models have been tested on it. Scores range from 31.2 to 33.3 out of 100.
Which model leads on SWE Atlas · Codebase QnA?
Claude Opus 4.6 (Fast) from Anthropic leads SWE Atlas · Codebase QnA with a score of 33.3. The median score across 3 tested models is 32.6.
Is SWE Atlas · Codebase QnA saturated?
No · the top score is 33.3 out of 100 (33%). There is still meaningful room for improvement on SWE Atlas · Codebase QnA.
What makes SWE Atlas · Codebase QnA distinctive?
SWE Atlas · Codebase QnA is a knowledge benchmark with limited overlap to the rest of the catalog · it measures capabilities that are not well-covered by other benchmarks we track.
How often is SWE Atlas · Codebase QnA data refreshed?
BenchGecko pulls updates daily. New model scores on SWE Atlas · Codebase QnA appear as soon as they are published by Epoch AI or the model provider.
Top on SWE Atlas · Codebase QnA
Claude Opus 4.6 (Fast) · 33.3GPT-5.3-Codex · 32.6Claude Sonnet 4.6 · 31.2More knowledge benchmarks
Same category · related evaluations