SimpleQA Verified
SimpleQA Verified · short factual questions with verified answers, measuring factual accuracy and the tendency to hallucinate or provide incorrect information.
The Frontier
Best score over time · one chart, every benchmark
Full rankings
32 models tested · sorted by score
| # | Model | Score |
|---|---|---|
| 1 | 77.3 | |
| 2 | 72.9 | |
| 3 | 67.5 | |
| 4 | 67.4 | |
| 5 | U Muse Spark | 66.3 |
| 6 | 56.0 | |
| 7 | 53.0 | |
| 8 | 50.6 | |
| 9 | 50.1 | |
| 10 | 48.9 | |
| 11 | 47.9 | |
| 12 | 47.8 | |
| 13 | 46.5 | |
| 14 | 44.8 | |
| 15 | 41.8 | |
| 16 | 38.9 | |
| 17 | 34.8 | |
| 18 | 33.9 | |
| 19 | 31.6 | |
| 20 | 31.5 | |
| 21 | 29.0 | |
| 22 | 27.5 | |
| 23 | 27.4 | |
| 24 | 27.4 | |
| 25 | 23.9 | |
| 26 | 23.6 | |
| 27 | 21.1 | |
| 28 | 21.0 | |
| 29 | 13.9 | |
| 30 | 12.2 | |
| 31 | 6.7 | |
| 32 | 5.9 |
Score distribution
Where models cluster
Correlated benchmarks
Pearson r · original research
Benchmarks that track with SimpleQA Verified
Pearson correlation across models scored on both benchmarks. Closer to 1 = strongly predictive.
Frequently asked
About SimpleQA Verified
What does SimpleQA Verified measure?
SimpleQA Verified · short factual questions with verified answers, measuring factual accuracy and the tendency to hallucinate or provide incorrect information. 32 AI models have been tested on it. Scores range from 5.9 to 77.3 out of 100.
Which model leads on SimpleQA Verified?
Gemini 3.1 Pro Preview from Google DeepMind leads SimpleQA Verified with a score of 77.3. The median score across 32 tested models is 36.8.
Is SimpleQA Verified saturated?
No · the top score is 77.3 out of 100 (77%). There is still meaningful room for improvement on SimpleQA Verified.
Does SimpleQA Verified predict performance on other benchmarks?
Yes · SimpleQA Verified scores correlate 0.90 with Balrog across 6 shared models. Models that do well on SimpleQA Verified tend to do well on Balrog.
How often is SimpleQA Verified data refreshed?
BenchGecko pulls updates daily. New model scores on SimpleQA Verified appear as soon as they are published by Epoch AI or the model provider.
- Category
- Knowledge
- Max score
- 100
- Models
- 32
- Updated
- 2026-03-05
Top on SimpleQA Verified
Gemini 3.1 Pro Preview · 77.3Gemini 3 Pro · 72.9Qwen3 Max · 67.5Gemini 3 Flash Preview · 67.4Muse Spark · 66.3More knowledge benchmarks
Same category · related evaluations