Which model leads on SimpleBench?

Gemini 3.1 Pro Preview from Google DeepMind leads SimpleBench with a score of 75.5. The median score across 52 tested models is 28.1.

Is SimpleBench saturated?

No · the top score is 75.5 out of 100 (76%). There is still meaningful room for improvement on SimpleBench.

Does SimpleBench predict performance on other benchmarks?

Yes · SimpleBench scores correlate 0.95 with Chatbot Arena Elo · Overall across 25 shared models. Models that do well on SimpleBench tend to do well on Chatbot Arena Elo · Overall.

How often is SimpleBench data refreshed?

BenchGecko pulls updates daily. New model scores on SimpleBench appear as soon as they are published by Epoch AI or the model provider.

Benchmark · ReasoningCompetitive

SimpleBench

Name: SimpleBench Benchmark
Creator: BenchGecko
License: https://creativecommons.org/licenses/by/4.0/

SimpleBench · tests fundamental reasoning capabilities with straightforward problems designed to expose gaps in basic logical and spatial thinking.

Updated 2026-03-05

Models tested

Top score

75.5

Gemini 3.1 Pro Preview

Median

28.1

min 1.4

Top-5 spread

σ 7.5

wide open

The Frontier

Best score over time · one chart, every benchmark

Chart type

Frontier on SimpleBench rose from 28.1 to 75.5 in 14 months · +47.4 points · latest leader Gemini 3.1 Pro Preview from Google DeepMind.

Pink dots = frontier records · 7 totalClick to open model page

Full rankings

52 models tested · sorted by score

#	Model	Score	Price
1	Gemini 3.1 Pro Preview· Google DeepMind	75.5	$2.00
2	Gemini 3 Pro· Google DeepMind	71.7	—
3	GPT-5.4 Pro· OpenAI	68.9	$30.00
4	Claude Opus 4.6· Anthropic	61.1	$5.00
5	Gemini 2.5 Pro· Google DeepMind	54.9	$1.25
6	Claude Opus 4.5· Anthropic	54.4	$5.00
7	GPT-5 Pro· OpenAI	53.9	$15.00
8	Gemini 3 Flash Preview· Google DeepMind	53.3	$0.50
9	Grok 4· xAI	52.6	$3.00
10	Claude Opus 4.1· Anthropic	52.0	$15.00
11	Claude Opus 4· Anthropic	50.6	$15.00
12	GPT-5.2 Pro· OpenAI	48.9	$21.00
13	GPT-5· OpenAI	48.0	$1.25
14	Claude Sonnet 4.5· Anthropic	45.2	$3.00
15	GLM 5· z-ai	43.8	$0.60
16	GPT-5.1· OpenAI	43.8	$1.25
17	o3· OpenAI	43.7	$2.00
18	GLM 4.7· z-ai	37.2	$0.38
19	Kimi K2.5· moonshotai	36.2	$0.44
20	Claude 3.7 Sonnet· Anthropic	35.7	$3.00
21	GPT-5.2· OpenAI	35.0	$1.75
22	Claude Sonnet 4· Anthropic	34.6	$3.00
23	o1-preview· OpenAI	30.0	—
24	Gemini 2.5 Flash· Google DeepMind	29.4	$0.30
25	R1 0528· DeepSeek	29.0	$0.50
26	o1· OpenAI	28.1	$15.00
27	DeepSeek V3.1· DeepSeek	28.0	$0.15
28	o4 Mini· OpenAI	26.4	$1.10
29	Grok 3· xAI	23.3	$3.00
30	GPT-4.5· OpenAI	21.4	—
31	Gemini 2.0 Flash· Google DeepMind	17.3	$0.10
32	Qwen3 235B A22B· Alibaba Qwen	17.2	$0.46
33	R1· DeepSeek	17.1	$0.70
34	Gemini 2.0 Flash Thinking (Jan 2025)· Google DeepMind	16.8	—
35	Llama 4 Maverick· Meta	13.2	$0.15
36	Claude 3.5 Sonnet· Anthropic	13.0	—
37	Gemini 1.5 Pro (Feb 2024)· Google DeepMind	12.5	—
38	GPT-4.1· OpenAI	12.4	$2.00
39	Kimi K2 0711· moonshotai	11.6	$0.57
40	GPT-4 Turbo· OpenAI	10.1	$10.00
41	Claude 3 Opus· Anthropic	8.2	—
42	Llama 3.1 405B· Meta	7.6	—
43	o3 Mini· OpenAI	7.4	$1.10
44	Grok-2 (Dec 2024)· xAI	7.2	—
45	Mistral Large· Mistral AI	7.0	$2.00
46	Mistral Large 2407· Mistral AI	7.0	$2.00
47	gpt-oss-120b· OpenAI	6.5	$0.04
48	Llama 3.3 70B Instruct (free)· Meta	3.9	$0.00
49	DeepSeek V3· DeepSeek	2.7	$0.32
50	o1-mini· OpenAI	1.7	—
51	GPT-4o (2024-08-06)· OpenAI	1.4	$2.50
52	GPT-4o (2024-11-20)· OpenAI	1.4	$2.50