Benchmark · ReasoningCompetitive

SimpleBench

SimpleBench · tests fundamental reasoning capabilities with straightforward problems designed to expose gaps in basic logical and spatial thinking.

Updated 2026-03-05
Models tested
52
Top score
75.5
Gemini 3.1 Pro Preview
Median
28.1
min 1.4
Top-5 spread
σ 7.5
wide open

Best score over time · one chart, every benchmark

SIMPLEBENCH41 MODELS · FRONTIER RUNNING MAX0255075100SCORE ↑Jul 24Dec 24May 25Oct 25Mar 26RELEASE DATE →benchgecko.ai/benchmark/simplebench · frontier
Frontier on SimpleBench rose from 28.1 to 75.5 in 14 months · +47.4 points · latest leader Gemini 3.1 Pro Preview from Google DeepMind.
Pink dots = frontier records · 7 totalClick to open model page

52 models tested · sorted by score

#ModelScore
1Google DeepMind logoGemini 3.1 Pro Preview75.5
2Google DeepMind logoGemini 3 Pro71.7
3OpenAI logoGPT-5.4 Pro68.9
4Anthropic logoClaude Opus 4.661.1
5Google DeepMind logoGemini 2.5 Pro54.9
6Anthropic logoClaude Opus 4.554.4
7OpenAI logoGPT-5 Pro53.9
8Google DeepMind logoGemini 3 Flash Preview53.3
9xAI logoGrok 452.6
10Anthropic logoClaude Opus 4.152.0
11Anthropic logoClaude Opus 450.6
12OpenAI logoGPT-5.2 Pro48.9
13OpenAI logoGPT-548.0
14Anthropic logoClaude Sonnet 4.545.2
15z-ai logoGLM 543.8
16OpenAI logoGPT-5.143.8
17OpenAI logoo343.7
18z-ai logoGLM 4.737.2
19moonshotai logoKimi K2.536.2
20Anthropic logoClaude 3.7 Sonnet35.7
21OpenAI logoGPT-5.235.0
22Anthropic logoClaude Sonnet 434.6
23OpenAI logoo1-preview30.0
24Google DeepMind logoGemini 2.5 Flash29.4
25DeepSeek logoR1 052829.0
26OpenAI logoo128.1
27DeepSeek logoDeepSeek V3.128.0
28OpenAI logoo4 Mini26.4
29xAI logoGrok 323.3
30OpenAI logoGPT-4.521.4
31Google DeepMind logoGemini 2.0 Flash17.3
32Alibaba Qwen logoQwen3 235B A22B17.2
33DeepSeek logoR117.1
34Google DeepMind logoGemini 2.0 Flash Thinking (Jan 2025)16.8
35Meta logoLlama 4 Maverick13.2
36Anthropic logoClaude 3.5 Sonnet13.0
37Google DeepMind logoGemini 1.5 Pro (Feb 2024)12.5
38OpenAI logoGPT-4.112.4
39moonshotai logoKimi K2 071111.6
40OpenAI logoGPT-4 Turbo10.1
41Anthropic logoClaude 3 Opus8.2
42Meta logoLlama 3.1 405B7.6
43OpenAI logoo3 Mini7.4
44xAI logoGrok-2 (Dec 2024)7.2
45Mistral AI logoMistral Large7.0
46Mistral AI logoMistral Large 24077.0
47OpenAI logogpt-oss-120b6.5
48Meta logoLlama 3.3 70B Instruct (free)3.9
49DeepSeek logoDeepSeek V32.7
50OpenAI logoo1-mini1.7
51OpenAI logoGPT-4o (2024-08-06)1.4
52OpenAI logoGPT-4o (2024-11-20)1.4
Details
Category
Reasoning
Max score
100
Models
52
Updated
2026-03-05

Same category · related evaluations