LiveBench · Language
The Frontier
Best score over time · one chart, every benchmark
Distribution
Where models cluster
Correlated benchmarks
Pearson r · original research
Benchmarks that track with LiveBench · Language
Pearson correlation across models scored on both benchmarks. Closer to 1 = strongly predictive.
Full rankings
29 models tested · sorted by score
| # | Model | Score |
|---|---|---|
| 1 | 77.5 | |
| 2 | 75.4 | |
| 3 | 75.0 | |
| 4 | 73.7 | |
| 5 | 71.8 | |
| 6 | 71.3 | |
| 7 | 69.5 | |
| 8 | 69.5 | |
| 9 | 69.2 | |
| 10 | 69.1 | |
| 11 | 66.8 | |
| 12 | 66.5 | |
| 13 | 66.3 | |
| 14 | 66.1 | |
| 15 | 65.6 | |
| 16 | 65.2 | |
| 17 | 64.2 | |
| 18 | 63.0 | |
| 19 | 62.3 | |
| 20 | 59.0 | |
| 21 | 56.3 | |
| 22 | 55.1 | |
| 23 | 49.7 | |
| 24 | 48.6 | |
| 25 | 47.7 | |
| 26 | 45.7 | |
| 27 | 41.8 | |
| 28 | 30.0 | |
| 29 | 28.7 |
Frequently asked
Pulled from the LiveBench · Language dataset · updated daily
What does LiveBench · Language measure?
LiveBench · Language is a knowledge benchmark in the BenchGecko catalog. 29 AI models have been tested on it. Scores range from 28.7 to 77.5 out of 100.
Which model leads on LiveBench · Language?
GLM 5 from z-ai leads LiveBench · Language with a score of 77.5. The median score across 29 tested models is 65.6.
Is LiveBench · Language saturated?
No · the top score is 77.5 out of 100 (78%). There is still meaningful room for improvement on LiveBench · Language.
Does LiveBench · Language predict performance on other benchmarks?
Yes · LiveBench · Language scores correlate 0.88 with LiveBench · Mathematics across 29 shared models. Models that do well on LiveBench · Language tend to do well on LiveBench · Mathematics.
How often is LiveBench · Language data refreshed?
BenchGecko pulls updates daily. New model scores on LiveBench · Language appear as soon as they are published by Epoch AI or the model provider.
Top on LiveBench · Language
GLM 5 · 77.5GPT-5.1-Codex-Max · 75.4Qwen3.6 Plus · 75.0GPT-5.2-Codex · 73.7GLM 5.1 · 71.8More knowledge benchmarks
Same category · related evaluations