Beta
Benchmark · Knowledge

Aider · Code Editing

Updated 2025-04-15
Models tested
27
Top score
84.2
Claude 3.5 Sonnet
Median
60.2
min 14.3
Top-5 spread
σ 5.4
wide open

Best score over time · one chart, every benchmark

AIDER · CODE EDITING16 MODELS · FRONTIER RUNNING MAX0255075100SCORE ↑May 24Aug 24Oct 24Jan 25Apr 25RELEASE DATE →benchgecko.ai/benchmark/aider-edit · frontier
Frontier on Aider · Code Editing rose from 72.9 to 84.2 in 7 months · +11.3 points · latest leader o1 from OpenAI.
Pink dots = frontier records · 2 totalClick to open model page

Where models cluster

SCORE DISTRIBUTION0–10110–2020–30330–40140–50850–60660–70670–80280–9090–100MEDIAN · 60.2SCORE BUCKET → (0 TO 100)MODELSbenchgecko.ai

Pearson r · original research

Correlation analysis

Benchmarks that track with Aider · Code Editing

Pearson correlation across models scored on both benchmarks. Closer to 1 = strongly predictive.

27 models tested · sorted by score

Pulled from the Aider · Code Editing dataset · updated daily

What does Aider · Code Editing measure?

Aider · Code Editing is a knowledge benchmark in the BenchGecko catalog. 27 AI models have been tested on it. Scores range from 14.3 to 84.2 out of 100.

Which model leads on Aider · Code Editing?

Claude 3.5 Sonnet from Anthropic leads Aider · Code Editing with a score of 84.2. The median score across 27 tested models is 60.2.

Is Aider · Code Editing saturated?

No · the top score is 84.2 out of 100 (84%). There is still meaningful room for improvement on Aider · Code Editing.

Does Aider · Code Editing predict performance on other benchmarks?

Yes · Aider · Code Editing scores correlate 0.94 with The Agent Company across 6 shared models. Models that do well on Aider · Code Editing tend to do well on The Agent Company.

How often is Aider · Code Editing data refreshed?

BenchGecko pulls updates daily. New model scores on Aider · Code Editing appear as soon as they are published by Epoch AI or the model provider.

Same category · related evaluations