Beta
Benchmark · Knowledge

LiveBench · Agentic Coding

Updated 2026-04-07
Models tested
29
Top score
56.7
GPT-5.1-Codex-Max
Median
36.7
min 3.3
Top-5 spread
σ 1.1
settled

Best score over time · one chart, every benchmark

LIVEBENCH · AGENTIC CODING29 MODELS · FRONTIER RUNNING MAX0255075100SCORE ↑Jul 25Sep 25Nov 25Feb 26Apr 26RELEASE DATE →benchgecko.ai/benchmark/livebench-agentic-coding · frontier
Frontier on LiveBench · Agentic Coding rose from 13.3 to 56.7 in 5 months · +43.3 points · latest leader GPT-5.1-Codex-Max from OpenAI.
Pink dots = frontier records · 7 totalClick to open model page

Where models cluster

SCORE DISTRIBUTION40–10410–20320–30530–40540–50850–6060–7070–8080–9090–100MEDIAN · 36.7SCORE BUCKET → (0 TO 100)MODELSbenchgecko.ai

Pearson r · original research

Correlation analysis

Benchmarks that track with LiveBench · Agentic Coding

Pearson correlation across models scored on both benchmarks. Closer to 1 = strongly predictive.

29 models tested · sorted by score

Pulled from the LiveBench · Agentic Coding dataset · updated daily

What does LiveBench · Agentic Coding measure?

LiveBench · Agentic Coding is a knowledge benchmark in the BenchGecko catalog. 29 AI models have been tested on it. Scores range from 3.3 to 56.7 out of 100.

Which model leads on LiveBench · Agentic Coding?

GPT-5.1-Codex-Max from OpenAI leads LiveBench · Agentic Coding with a score of 56.7. The median score across 29 tested models is 36.7.

Is LiveBench · Agentic Coding saturated?

No · the top score is 56.7 out of 100 (57%). There is still meaningful room for improvement on LiveBench · Agentic Coding.

Does LiveBench · Agentic Coding predict performance on other benchmarks?

Yes · LiveBench · Agentic Coding scores correlate 0.93 with ARC-AGI-2 across 6 shared models. Models that do well on LiveBench · Agentic Coding tend to do well on ARC-AGI-2.

How often is LiveBench · Agentic Coding data refreshed?

BenchGecko pulls updates daily. New model scores on LiveBench · Agentic Coding appear as soon as they are published by Epoch AI or the model provider.

Same category · related evaluations