Which model leads on Fiction.LiveBench?

GPT-5 from OpenAI leads Fiction.LiveBench with a score of 97.2. The median score across 41 tested models is 61.1.

Is Fiction.LiveBench saturated?

Yes · the top model on Fiction.LiveBench has reached 97.2 out of 100, within 5% of the theoretical ceiling. This benchmark is approaching saturation and may be replaced by a harder successor.

Does Fiction.LiveBench predict performance on other benchmarks?

Yes · Fiction.LiveBench scores correlate 0.96 with Artificial Analysis · Coding Index across 5 shared models. Models that do well on Fiction.LiveBench tend to do well on Artificial Analysis · Coding Index.

How often is Fiction.LiveBench data refreshed?

BenchGecko pulls updates daily. New model scores on Fiction.LiveBench appear as soon as they are published by Epoch AI or the model provider.

Benchmark · KnowledgeSettled

Fiction.LiveBench

Name: Fiction.LiveBench Benchmark
Creator: BenchGecko
License: https://creativecommons.org/licenses/by/4.0/

Fiction.LiveBench · a continuously updated benchmark using recently published fiction to test reading comprehension and reasoning, preventing data contamination.

Updated 2026-01-27

Models tested

Top score

97.2

GPT-5

Median

61.1

min 25.0

Top-5 spread

σ 2.1

Competitive

The Frontier

Best score over time · one chart, every benchmark

Chart type

Frontier on Fiction.LiveBench rose from 33.3 to 97.2 in 6 months · +63.9 points · latest leader o3 Pro from OpenAI.

Pink dots = frontier records · 4 totalClick to open model page

Full rankings

41 models tested · sorted by score

#	Model	Score	Price
1	GPT-5· OpenAI	97.2	$1.25
2	o3 Pro· OpenAI	97.2	$20.00
3	Grok 4· xAI	94.4	$3.00
4	Grok 4 Fast· xAI	94.4	$0.20
5	Gemini 2.5 Pro· Google DeepMind	91.7	$1.25
6	o3· OpenAI	88.9	$2.00
7	Kimi K2.5· moonshotai	86.1	$0.44
8	Claude 3.7 Sonnet· Anthropic	83.3	$3.00
9	DeepSeek V3.2 Exp· DeepSeek	83.3	$0.27
10	o1· OpenAI	83.3	$15.00
11	o4 Mini· OpenAI	77.8	$1.10
12	Qwen3 235B A22B Thinking 2507· Alibaba Qwen	75.0	$0.15
13	GPT-5 Mini· OpenAI	69.4	$0.25
14	R1· DeepSeek	69.4	$0.70
15	Qwen3 235B A22B· Alibaba Qwen	67.7	$0.46
16	Grok 3 Mini· xAI	66.7	$0.30
17	Qwen2.5-Max· Alibaba Qwen	66.7	—
18	Qwen3 Max· Alibaba Qwen	66.7	$0.78
19	GPT-4.1· OpenAI	63.9	$2.00
20	GPT-4.5· OpenAI	63.9	—
21	Claude Opus 4· Anthropic	61.1	$15.00
22	Gemini 2.0 Flash· Google DeepMind	61.1	$0.10
23	Kimi K2 0711· moonshotai	61.1	$0.57
24	Grok 3· xAI	58.3	$3.00
25	Qwen3 235B A22B Instruct 2507· Alibaba Qwen	52.9	$0.07
26	DeepSeek V3.1· DeepSeek	52.8	$0.15
27	Gemini 2.0 Flash Thinking (Jan 2025)· Google DeepMind	52.8	—
28	DeepSeek V3· DeepSeek	50.0	$0.32
29	o3 Mini· OpenAI	50.0	$1.10
30	Gemini 2.5 Flash· Google DeepMind	47.2	$0.30
31	Claude Sonnet 4· Anthropic	46.9	$3.00
32	Llama 4 Maverick· Meta	46.2	$0.15
33	GPT-4.1 Mini· OpenAI	44.4	$0.40
34	GPT-5 Nano· OpenAI	44.4	$0.05
35	gpt-oss-120b· OpenAI	44.4	$0.04
36	Gemini 2.0 Pro· Google DeepMind	41.7	—
37	Llama 4 Scout· Meta	36.0	$0.08
38	Gemma 3 27B· Google DeepMind	33.3	$0.08
39	Gemma 3 27B (free)· Google DeepMind	33.3	$0.00
40	Llama 3.3 70B Instruct (free)· Meta	33.3	$0.00
41	GPT-4.1 Nano· OpenAI	25.0	$0.10