API
Benchmarks/SimpleBench

SimpleBench

SimpleBench β€” tests fundamental reasoning capabilities with straightforward problems designed to expose gaps in basic logical and spatial thinking.

61
Models Tested
75.5
Top Score
27.7
Average Score
1Google DeepMindGoogle DeepMind75.5
2GoogleGoogle71.7
3OpenAIOpenAI68.9
4AnthropicAnthropic61.1
5Google DeepMindGoogle DeepMind54.9
6AnthropicAnthropic54.4
7OpenAIOpenAI53.9
8Google DeepMindGoogle DeepMind53.3
9xAIxAI52.6
10AnthropicAnthropic52.0
11AnthropicAnthropic50.6
12OpenAIOpenAI48.0
13OpenAIOpenAI48.0
14AnthropicAnthropic45.2
15
ZA
z-ai
43.8
16
ZA
z-ai
43.8
17OpenAIOpenAI43.8
18OpenAIOpenAI43.8
19OpenAIOpenAI43.7
20
ZA
z-ai
37.2
21
M
moonshotai
36.2
22AnthropicAnthropic35.7
23AnthropicAnthropic35.7
24OpenAIOpenAI35.0
25OpenAIOpenAI35.0
26AnthropicAnthropic34.6
27OpenAIOpenAI30.0
28DeepSeekDeepSeek29.0
29OpenAIOpenAI28.1
30DeepSeekDeepSeek28.0
31OpenAIOpenAI26.4
32xAIxAI23.3
33xAIxAI23.3
34OpenAIOpenAI21.4
35Google DeepMindGoogle DeepMind17.3
36Alibaba QwenAlibaba Qwen17.2
37DeepSeekDeepSeek17.1
38GoogleGoogle16.8
39MetaMeta13.2
40AnthropicAnthropic13.0
41GoogleGoogle12.5
42OpenAIOpenAI12.4
43
M
moonshotai
11.6
44OpenAIOpenAI10.1
45AnthropicAnthropic8.2
46MetaMeta7.6
47OpenAIOpenAI7.4
48xAIxAI7.2
49Mistral AIMistral AI7.0
50OpenAIOpenAI6.5
51OpenAIOpenAI6.5
52OpenAIOpenAI6.5
53OpenAIOpenAI6.5
54MetaMeta3.9
55MetaMeta3.9
56DeepSeekDeepSeek2.7
57OpenAIOpenAI1.7
58OpenAIOpenAI1.4
59OpenAIOpenAI1.4
60OpenAIOpenAI1.4
61OpenAIOpenAI1.4