LiveBench · If
The Frontier
Best score over time · one chart, every benchmark
Distribution
Where models cluster
Correlated benchmarks
Pearson r · original research
Benchmarks that track with LiveBench · If
Pearson correlation across models scored on both benchmarks. Closer to 1 = strongly predictive.
Full rankings
29 models tested · sorted by score
| # | Model | Score |
|---|---|---|
| 1 | 68.5 | |
| 2 | 67.6 | |
| 3 | 67.1 | |
| 4 | 66.5 | |
| 5 | 64.2 | |
| 6 | 63.4 | |
| 7 | 62.0 | |
| 8 | 61.1 | |
| 9 | 59.0 | |
| 10 | 58.3 | |
| 11 | 57.2 | |
| 12 | 55.3 | |
| 13 | 52.0 | |
| 14 | 50.3 | |
| 15 | 43.2 | |
| 16 | 41.5 | |
| 17 | 40.6 | |
| 18 | 35.7 | |
| 19 | 28.4 | |
| 20 | 27.2 | |
| 21 | 26.2 | |
| 22 | 23.1 | |
| 23 | 21.7 | |
| 24 | 19.3 | |
| 25 | 19.2 | |
| 26 | 18.9 | |
| 27 | 17.1 | |
| 28 | 16.5 | |
| 29 | 13.5 |
Frequently asked
Pulled from the LiveBench · If dataset · updated daily
What does LiveBench · If measure?
LiveBench · If is a knowledge benchmark in the BenchGecko catalog. 29 AI models have been tested on it. Scores range from 13.5 to 68.5 out of 100.
Which model leads on LiveBench · If?
GLM 5.1 from z-ai leads LiveBench · If with a score of 68.5. The median score across 29 tested models is 43.2.
Is LiveBench · If saturated?
No · the top score is 68.5 out of 100 (68%). There is still meaningful room for improvement on LiveBench · If.
Does LiveBench · If predict performance on other benchmarks?
Yes · LiveBench · If scores correlate 0.83 with LiveBench · Overall across 29 shared models. Models that do well on LiveBench · If tend to do well on LiveBench · Overall.
How often is LiveBench · If data refreshed?
BenchGecko pulls updates daily. New model scores on LiveBench · If appear as soon as they are published by Epoch AI or the model provider.
More knowledge benchmarks
Same category · related evaluations