Chatbot Arena Elo · Coding
The Frontier
Best score over time · one chart, every benchmark
Distribution
Where models cluster
Correlated benchmarks
Pearson r · original research
Benchmarks that track with Chatbot Arena Elo · Coding
Pearson correlation across models scored on both benchmarks. Closer to 1 = strongly predictive.
Full rankings
27 models tested · sorted by score
| # | Model | Score |
|---|---|---|
| 1 | 1546.2 | |
| 2 | 1542.9 | |
| 3 | 1521.0 | |
| 4 | 1465.2 | |
| 5 | 1455.7 | |
| 6 | 1441.0 | |
| 7 | 1439.2 | |
| 8 | 1437.6 | |
| 9 | 1436.4 | |
| 10 | 1433.4 | |
| 11 | 1427.7 | |
| 12 | 1403.1 | |
| 13 | 1396.3 | |
| 14 | 1386.1 | |
| 15 | 1362.3 | |
| 16 | 1353.7 | |
| 17 | 1344.0 | |
| 18 | 1338.8 | |
| 19 | 1336.5 | |
| 20 | 1326.9 | |
| 21 | 1303.3 | |
| 22 | 1285.5 | |
| 23 | 1246.5 | |
| 24 | 1238.1 | |
| 25 | 1235.4 | |
| 26 | 1202.0 | |
| 27 | 1182.2 |
Frequently asked
Pulled from the Chatbot Arena Elo · Coding dataset · updated daily
What does Chatbot Arena Elo · Coding measure?
Chatbot Arena Elo · Coding is a knowledge benchmark in the BenchGecko catalog. 27 AI models have been tested on it. Scores range from 1182.2 to 1546.2 out of 1600.
Which model leads on Chatbot Arena Elo · Coding?
Claude Opus 4.6 (Fast) from Anthropic leads Chatbot Arena Elo · Coding with a score of 1546.2. The median score across 27 tested models is 1386.1.
Is Chatbot Arena Elo · Coding saturated?
Yes · the top model on Chatbot Arena Elo · Coding has reached 1546.2 out of 1600, within 5% of the theoretical ceiling. This benchmark is approaching saturation and may be replaced by a harder successor.
Does Chatbot Arena Elo · Coding predict performance on other benchmarks?
Yes · Chatbot Arena Elo · Coding scores correlate 0.95 with SWE-Bench verified across 10 shared models. Models that do well on Chatbot Arena Elo · Coding tend to do well on SWE-Bench verified.
How often is Chatbot Arena Elo · Coding data refreshed?
BenchGecko pulls updates daily. New model scores on Chatbot Arena Elo · Coding appear as soon as they are published by Epoch AI or the model provider.
More knowledge benchmarks
Same category · related evaluations