Beta
Benchmark · Code

Aider polyglot

Aider Polyglot · measures how well AI models can edit code across multiple programming languages using the Aider coding assistant framework.

Updated 2025-12-01
Models tested
53
Top score
88.0
GPT-5
Median
52.4
min 3.6
Top-5 spread
σ 2.2
competitive

Best score over time · one chart, every benchmark

AIDER POLYGLOT47 MODELS · FRONTIER RUNNING MAX0255075100SCORE ↑Jul 24Nov 24Mar 25Jul 25Dec 25RELEASE DATE →benchgecko.ai/benchmark/aider-polyglot · frontier
Frontier on Aider polyglot rose from 3.6 to 88.0 in 13 months · +84.4 points · latest leader GPT-5 from OpenAI.
Pink dots = frontier records · 9 totalClick to open model page

Where models cluster

SCORE DISTRIBUTION50–10310–20520–30430–40840–50950–60560–70870–80680–9090–100MEDIAN · 52.4SCORE BUCKET → (0 TO 100)MODELSbenchgecko.ai

Pearson r · original research

53 models tested · sorted by score

#ModelScore
1OpenAI logoGPT-588.0
2OpenAI logoGPT-5 Chat88.0
3OpenAI logoo3 Pro84.9
4Google DeepMind logoGemini 2.5 Pro83.1
5Google DeepMind logoGemini 2.5 Pro Preview 06-0583.1
6OpenAI logoo381.3
7xAI logoGrok 479.6
8Google DeepMind logoGemini 2.5 Pro Preview 05-0676.9
9DeepSeek logoDeepSeek V3.274.2
10DeepSeek logoDeepSeek V3.2 Exp74.2
11Anthropic logoClaude Opus 472.0
12OpenAI logoo4 Mini72.0
13OpenAI logoo4 Mini High72.0
14DeepSeek logoR1 052871.4
15Anthropic logoClaude 3.7 Sonnet64.9
16OpenAI logoo161.7
17Anthropic logoClaude Sonnet 461.3
18OpenAI logoo3 Mini60.4
19OpenAI logoo3 Mini High60.4
20Alibaba Qwen logoQwen3 235B A22B59.6
21Alibaba Qwen logoQwen3 235B A22B Instruct 250759.6
22moonshotai logoKimi K2 071159.1
23DeepSeek logoR156.9
24DeepSeek logoDeepSeek V3 032455.1
25xAI logoGrok 353.3
26xAI logoGrok 3 Beta53.3
27OpenAI logoGPT-4.152.4
28Anthropic logoClaude 3.5 Sonnet51.6
29xAI logoGrok 3 Mini49.3
30xAI logoGrok 3 Mini Beta49.3
31DeepSeek logoDeepSeek V348.4
32Google DeepMind logoGemini 2.5 Flash47.1
33OpenAI logoGPT-4.544.9
34OpenAI logogpt-oss-120b41.8
35OpenAI logogpt-oss-120b (free)41.8
36Alibaba Qwen logoQwen3 32B40.0
37Google DeepMind logoGemini 2.0 Flash38.2
38Google DeepMind logoGemini 2.0 Pro35.6
39OpenAI logoo1-mini32.9
40OpenAI logoGPT-4.1 Mini32.4
41Anthropic logoClaude 3.5 Haiku28.0
42OpenAI logoGPT-4o (2024-08-06)23.1
43OpenAI logoGPT-4o (2024-11-20)23.1
44Alibaba Qwen logoQwen2.5-Max21.8
45Alibaba Qwen logoQwQ 32B20.9
46Google DeepMind logoGemini 2.0 Flash Thinking (Jan 2025)18.2
47Alibaba Qwen logoQwen2.5 Coder 32B Instruct16.4
48Meta logoLlama 4 Maverick15.6
49OpenAI logoGPT-4.1 Nano8.9
50Google DeepMind logoGemma 3 27B4.9
51Google DeepMind logoGemma 3 27B (free)4.9
52OpenAI logoGPT-4o-mini3.6
53OpenAI logoGPT-4o-mini (2024-07-18)3.6

Pulled from the Aider polyglot dataset · updated daily

What does Aider polyglot measure?

Aider polyglot is a code benchmark in the BenchGecko catalog. 53 AI models have been tested on it. Scores range from 3.6 to 88.0 out of 100.

Which model leads on Aider polyglot?

GPT-5 from OpenAI leads Aider polyglot with a score of 88.0. The median score across 53 tested models is 52.4.

Is Aider polyglot saturated?

No · the top score is 88.0 out of 100 (88%). There is still meaningful room for improvement on Aider polyglot.

Does Aider polyglot predict performance on other benchmarks?

Yes · Aider polyglot scores correlate 0.96 with OpenCompass · MMLU-Pro across 8 shared models. Models that do well on Aider polyglot tend to do well on OpenCompass · MMLU-Pro.

How often is Aider polyglot data refreshed?

BenchGecko pulls updates daily. New model scores on Aider polyglot appear as soon as they are published by Epoch AI or the model provider.

Same category · related evaluations