SimpleQA Verified
SimpleQA Verified β short factual questions with verified answers, measuring factual accuracy and the tendency to hallucinate or provide incorrect information.
36
Models Tested
77.3
Top Score
36.4
Average Score
Rankings
| # | Model | Score | Bar |
|---|---|---|---|
| 1 | 77.3 | ||
| 2 | 72.9 | ||
| 3 | 67.5 | ||
| 4 | 67.4 | ||
| 5 | 56.0 | ||
| 6 | 53.0 | ||
| 7 | 50.6 | ||
| 8 | 50.6 | ||
| 9 | 50.1 | ||
| 10 | 48.9 | ||
| 11 | 48.9 | ||
| 12 | 47.9 | ||
| 13 | 46.5 | ||
| 14 | 44.8 | ||
| 15 | 41.8 | ||
| 16 | 38.9 | ||
| 17 | 38.9 | ||
| 18 | 34.8 | ||
| 19 | M Kimi K2.5moonshotai | 33.9 | |
| 20 | M Kimi K2 Thinkingmoonshotai | 31.6 | |
| 21 | ZA GLM 4.7z-ai | 31.5 | |
| 22 | 29.0 | ||
| 23 | 27.5 | ||
| 24 | 27.4 | ||
| 25 | 23.9 | ||
| 26 | 23.6 | ||
| 27 | 21.1 | ||
| 28 | 21.1 | ||
| 29 | 21.0 | ||
| 30 | 13.9 | ||
| 31 | 13.9 | ||
| 32 | 13.9 | ||
| 33 | 13.9 | ||
| 34 | 12.2 | ||
| 35 | 6.7 | ||
| 36 | 5.9 |