LiveBench · Agentic Coding
The Frontier
Best score over time · one chart, every benchmark
Distribution
Where models cluster
Correlated benchmarks
Pearson r · original research
Benchmarks that track with LiveBench · Agentic Coding
Pearson correlation across models scored on both benchmarks. Closer to 1 = strongly predictive.
Full rankings
29 models tested · sorted by score
| # | Model | Score |
|---|---|---|
| 1 | 56.7 | |
| 2 | 55.0 | |
| 3 | 55.0 | |
| 4 | 55.0 | |
| 5 | 53.3 | |
| 6 | 51.7 | |
| 7 | 51.7 | |
| 8 | 50.0 | |
| 9 | 46.7 | |
| 10 | 43.3 | |
| 11 | 41.7 | |
| 12 | 40.0 | |
| 13 | 40.0 | |
| 14 | 38.3 | |
| 15 | 36.7 | |
| 16 | 35.0 | |
| 17 | 35.0 | |
| 18 | 30.0 | |
| 19 | 28.3 | |
| 20 | 27.1 | |
| 21 | 23.0 | |
| 22 | 17.0 | |
| 23 | 16.7 | |
| 24 | 13.3 | |
| 25 | 10.0 | |
| 26 | 8.3 | |
| 27 | 6.7 | |
| 28 | 3.3 | |
| 29 | 3.3 |
Frequently asked
Pulled from the LiveBench · Agentic Coding dataset · updated daily
What does LiveBench · Agentic Coding measure?
LiveBench · Agentic Coding is a knowledge benchmark in the BenchGecko catalog. 29 AI models have been tested on it. Scores range from 3.3 to 56.7 out of 100.
Which model leads on LiveBench · Agentic Coding?
GPT-5.1-Codex-Max from OpenAI leads LiveBench · Agentic Coding with a score of 56.7. The median score across 29 tested models is 36.7.
Is LiveBench · Agentic Coding saturated?
No · the top score is 56.7 out of 100 (57%). There is still meaningful room for improvement on LiveBench · Agentic Coding.
Does LiveBench · Agentic Coding predict performance on other benchmarks?
Yes · LiveBench · Agentic Coding scores correlate 0.93 with ARC-AGI-2 across 6 shared models. Models that do well on LiveBench · Agentic Coding tend to do well on ARC-AGI-2.
How often is LiveBench · Agentic Coding data refreshed?
BenchGecko pulls updates daily. New model scores on LiveBench · Agentic Coding appear as soon as they are published by Epoch AI or the model provider.
Top on LiveBench · Agentic Coding
GPT-5.1-Codex-Max · 56.7GLM 5 · 55.0GLM 5.1 · 55.0Qwen3.6 Plus · 55.0GPT-5.1-Codex · 53.3More knowledge benchmarks
Same category · related evaluations