Which model leads on Chess Puzzles?

GPT-5.4 Pro from OpenAI leads Chess Puzzles with a score of 58.6. The median score across 24 tested models is 20.0.

Is Chess Puzzles saturated?

No · the top score is 58.6 out of 100 (59%). There is still meaningful room for improvement on Chess Puzzles.

Does Chess Puzzles predict performance on other benchmarks?

Yes · Chess Puzzles scores correlate 0.84 with VPCT across 9 shared models. Models that do well on Chess Puzzles tend to do well on VPCT.

How often is Chess Puzzles data refreshed?

BenchGecko pulls updates daily. New model scores on Chess Puzzles appear as soon as they are published by Epoch AI or the model provider.

Benchmark · ReasoningCompetitive

Chess Puzzles

Name: Chess Puzzles Benchmark
Creator: BenchGecko
License: https://creativecommons.org/licenses/by/4.0/

Chess Puzzles · tests strategic and tactical reasoning by having models solve chess puzzle positions, evaluating lookahead and pattern recognition abilities.

Updated 2026-03-05

Models tested

Top score

58.6

GPT-5.4 Pro

Median

20.0

min 4.0

Top-5 spread

σ 7.4

wide open

The Frontier

Best score over time · one chart, every benchmark

Chart type

Frontier on Chess Puzzles rose from 17.0 to 58.6 in 13 months · +41.6 points · latest leader GPT-5.4 Pro from OpenAI.

Pink dots = frontier records · 7 totalClick to open model page

Full rankings

24 models tested · sorted by score

#	Model	Score	Price
1	GPT-5.4 Pro· OpenAI	58.6	$30.00
2	Gemini 3.1 Pro Preview· Google DeepMind	55.0	$2.00
3	GPT-5.2· OpenAI	49.0	$1.75
4	GPT-5.4· OpenAI	44.0	$2.50
5	Gemini 3 Flash Preview· Google DeepMind	38.0	$0.50
6	GPT-5· OpenAI	37.0	$1.25
7	GPT-5.1· OpenAI	32.0	$1.25
8	Gemini 3 Pro· Google DeepMind	31.0	—
9	Grok 4· xAI	28.0	$3.00
10	o4 Mini· OpenAI	26.0	$1.10
11	Gemini 2.5 Pro· Google DeepMind	20.0	$1.25
12	gpt-oss-120b· OpenAI	20.0	$0.04
13	Kimi K2 Thinking· moonshotai	20.0	$0.60
14	Claude Opus 4.6· Anthropic	17.0	$5.00
15	o3 Mini· OpenAI	17.0	$1.10
16	DeepSeek V3.2· DeepSeek	14.0	$0.25
17	Claude Sonnet 4.6· Anthropic	13.0	$3.00
18	Claude Opus 4.5· Anthropic	12.0	$5.00
19	Claude Sonnet 4.5· Anthropic	12.0	$3.00
20	Kimi K2.5· moonshotai	12.0	$0.44
21	Qwen3 235B A22B Thinking 2507· Alibaba Qwen	12.0	$0.15
22	GLM 5· z-ai	10.0	$0.60
23	GLM 4.7· z-ai	6.0	$0.38
24	Qwen3 Max· Alibaba Qwen	4.0	$0.78