Which model leads on SimpleQA Verified?

Gemini 3.1 Pro Preview from Google DeepMind leads SimpleQA Verified with a score of 77.3. The median score across 32 tested models is 36.8.

Is SimpleQA Verified saturated?

No · the top score is 77.3 out of 100 (77%). There is still meaningful room for improvement on SimpleQA Verified.

Does SimpleQA Verified predict performance on other benchmarks?

Yes · SimpleQA Verified scores correlate 0.90 with Balrog across 6 shared models. Models that do well on SimpleQA Verified tend to do well on Balrog.

How often is SimpleQA Verified data refreshed?

BenchGecko pulls updates daily. New model scores on SimpleQA Verified appear as soon as they are published by Epoch AI or the model provider.

Benchmark · KnowledgeCompetitive

SimpleQA Verified

Name: SimpleQA Verified Benchmark
Creator: BenchGecko
License: https://creativecommons.org/licenses/by/4.0/

SimpleQA Verified · short factual questions with verified answers, measuring factual accuracy and the tendency to hallucinate or provide incorrect information.

Updated 2026-03-05

Models tested

Top score

77.3

Gemini 3.1 Pro Preview

Median

36.8

min 5.9

Top-5 spread

σ 4.2

Competitive

The Frontier

Best score over time · one chart, every benchmark

Chart type

Frontier on SimpleQA Verified rose from 6.7 to 77.3 in 16 months · +70.6 points · latest leader Gemini 3.1 Pro Preview from Google DeepMind.

Pink dots = frontier records · 6 totalClick to open model page

Full rankings

32 models tested · sorted by score

#	Model	Score	Price
1	Gemini 3.1 Pro Preview· Google DeepMind	77.3	$2.00
2	Gemini 3 Pro· Google DeepMind	72.9	—
3	Qwen3 Max· Alibaba Qwen	67.5	$0.78
4	Gemini 3 Flash Preview· Google DeepMind	67.4	$0.50
5	U Muse Spark· Unknown	66.3	—
6	Gemini 2.5 Pro· Google DeepMind	56.0	$1.25
7	o3· OpenAI	53.0	$2.00
8	GPT-5· OpenAI	50.6	$1.25
9	Qwen3 235B A22B Thinking 2507· Alibaba Qwen	50.1	$0.15
10	GPT-5.1· OpenAI	48.9	$1.25
11	Grok 4· xAI	47.9	$3.00
12	GPT-5.4 Pro· OpenAI	47.8	$30.00
13	Claude Opus 4.6· Anthropic	46.5	$5.00
14	GPT-5.4· OpenAI	44.8	$2.50
15	Claude Opus 4.5· Anthropic	41.8	$5.00
16	GPT-5.2· OpenAI	38.9	$1.75
17	Claude Opus 4.1· Anthropic	34.8	$15.00
18	Kimi K2.5· moonshotai	33.9	$0.44
19	Kimi K2 Thinking· moonshotai	31.6	$0.60
20	GLM 4.7· z-ai	31.5	$0.38
21	Claude Sonnet 4.6· Anthropic	29.0	$3.00
22	DeepSeek V3.2· DeepSeek	27.5	$0.25
23	R1· DeepSeek	27.4	$0.70
24	R1 0528· DeepSeek	27.4	$0.50
25	o4 Mini· OpenAI	23.9	$1.10
26	Claude Sonnet 4.5· Anthropic	23.6	$3.00
27	Grok 3 Mini· xAI	21.1	$0.30
28	GPT-5 Mini· OpenAI	21.0	$0.25
29	gpt-oss-120b· OpenAI	13.9	$0.04
30	GPT-5 Nano· OpenAI	12.2	$0.05
31	Claude 3.5 Haiku· Anthropic	6.7	$0.80
32	Claude Haiku 4.5· Anthropic	5.9	$1.00