LiveBench · Coding
The Frontier
Best score over time · one chart, every benchmark
Distribution
Where models cluster
Correlated benchmarks
Pearson r · original research
Benchmarks that track with LiveBench · Coding
Pearson correlation across models scored on both benchmarks. Closer to 1 = strongly predictive.
Full rankings
29 models tested · sorted by score
| # | Model | Score |
|---|---|---|
| 1 | 83.6 | |
| 2 | 81.4 | |
| 3 | 78.2 | |
| 4 | 76.1 | |
| 5 | 75.7 | |
| 6 | 75.4 | |
| 7 | 74.7 | |
| 8 | 73.9 | |
| 9 | 73.6 | |
| 10 | 73.2 | |
| 11 | 73.1 | |
| 12 | 71.8 | |
| 13 | 71.0 | |
| 14 | 70.7 | |
| 15 | 69.9 | |
| 16 | 69.6 | |
| 17 | 69.0 | |
| 18 | 68.8 | |
| 19 | 68.2 | |
| 20 | 67.4 | |
| 21 | 67.4 | |
| 22 | 66.8 | |
| 23 | 64.2 | |
| 24 | 61.9 | |
| 25 | 60.7 | |
| 26 | 60.3 | |
| 27 | 60.2 | |
| 28 | 54.9 | |
| 29 | 54.1 |
Frequently asked
Pulled from the LiveBench · Coding dataset · updated daily
What does LiveBench · Coding measure?
LiveBench · Coding is a knowledge benchmark in the BenchGecko catalog. 29 AI models have been tested on it. Scores range from 54.1 to 83.6 out of 100.
Which model leads on LiveBench · Coding?
GPT-5.2-Codex from OpenAI leads LiveBench · Coding with a score of 83.6. The median score across 29 tested models is 69.9.
Is LiveBench · Coding saturated?
No · the top score is 83.6 out of 100 (84%). There is still meaningful room for improvement on LiveBench · Coding.
Does LiveBench · Coding predict performance on other benchmarks?
Yes · LiveBench · Coding scores correlate 0.74 with Chatbot Arena Elo · Overall across 15 shared models. Models that do well on LiveBench · Coding tend to do well on Chatbot Arena Elo · Overall.
How often is LiveBench · Coding data refreshed?
BenchGecko pulls updates daily. New model scores on LiveBench · Coding appear as soon as they are published by Epoch AI or the model provider.
Top on LiveBench · Coding
GPT-5.2-Codex · 83.6GPT-5.1-Codex-Max · 81.4Qwen3.6 Plus · 78.2GPT-5 Mini · 76.1DeepSeek V3.2 · 75.7More knowledge benchmarks
Same category · related evaluations