Fiction.LiveBench
Fiction.LiveBench · a continuously updated benchmark using recently published fiction to test reading comprehension and reasoning, preventing data contamination.
The Frontier
Best score over time · one chart, every benchmark
Full rankings
41 models tested · sorted by score
| # | Model | Score |
|---|---|---|
| 1 | 97.2 | |
| 2 | 97.2 | |
| 3 | 94.4 | |
| 4 | 94.4 | |
| 5 | 91.7 | |
| 6 | 88.9 | |
| 7 | 86.1 | |
| 8 | 83.3 | |
| 9 | 83.3 | |
| 10 | 83.3 | |
| 11 | 77.8 | |
| 12 | 75.0 | |
| 13 | 69.4 | |
| 14 | 69.4 | |
| 15 | 67.7 | |
| 16 | 66.7 | |
| 17 | 66.7 | |
| 18 | 66.7 | |
| 19 | 63.9 | |
| 20 | 63.9 | |
| 21 | 61.1 | |
| 22 | 61.1 | |
| 23 | 61.1 | |
| 24 | 58.3 | |
| 25 | 52.9 | |
| 26 | 52.8 | |
| 27 | 52.8 | |
| 28 | 50.0 | |
| 29 | 50.0 | |
| 30 | 47.2 | |
| 31 | 46.9 | |
| 32 | 46.2 | |
| 33 | 44.4 | |
| 34 | 44.4 | |
| 35 | 44.4 | |
| 36 | 41.7 | |
| 37 | 36.0 | |
| 38 | 33.3 | |
| 39 | 33.3 | |
| 40 | 33.3 | |
| 41 | 25.0 |
Score distribution
Where models cluster
Correlated benchmarks
Pearson r · original research
Benchmarks that track with Fiction.LiveBench
Pearson correlation across models scored on both benchmarks. Closer to 1 = strongly predictive.
Frequently asked
About Fiction.LiveBench
What does Fiction.LiveBench measure?
Fiction.LiveBench · a continuously updated benchmark using recently published fiction to test reading comprehension and reasoning, preventing data contamination. 41 AI models have been tested on it. Scores range from 25.0 to 97.2 out of 100.
Which model leads on Fiction.LiveBench?
GPT-5 from OpenAI leads Fiction.LiveBench with a score of 97.2. The median score across 41 tested models is 61.1.
Is Fiction.LiveBench saturated?
Yes · the top model on Fiction.LiveBench has reached 97.2 out of 100, within 5% of the theoretical ceiling. This benchmark is approaching saturation and may be replaced by a harder successor.
Does Fiction.LiveBench predict performance on other benchmarks?
Yes · Fiction.LiveBench scores correlate 0.96 with Artificial Analysis · Coding Index across 5 shared models. Models that do well on Fiction.LiveBench tend to do well on Artificial Analysis · Coding Index.
How often is Fiction.LiveBench data refreshed?
BenchGecko pulls updates daily. New model scores on Fiction.LiveBench appear as soon as they are published by Epoch AI or the model provider.
- Category
- Knowledge
- Max score
- 100
- Models
- 41
- Updated
- 2026-01-27
Top on Fiction.LiveBench
GPT-5 · 97.2o3 Pro · 97.2Grok 4 · 94.4Grok 4 Fast · 94.4Gemini 2.5 Pro · 91.7More knowledge benchmarks
Same category · related evaluations