API
Benchmarks/GPQA diamond

GPQA diamond

Graduate-Level Google-Proof QA (Diamond set) β€” expert-crafted questions in physics, biology, and chemistry that are difficult even for domain PhDs.

115
Models Tested
92.1
Top Score
46.1
Average Score
1Google DeepMindGoogle DeepMind92.1
2OpenAIOpenAI91.1
3GoogleGoogle90.2
4OpenAIOpenAI88.5
5OpenAIOpenAI88.5
6AnthropicAnthropic87.4
7
ZA
z-ai
83.8
8
ZA
z-ai
83.8
9OpenAIOpenAI83.5
10OpenAIOpenAI83.5
11
M
moonshotai
83.5
12AnthropicAnthropic83.2
13xAIxAI82.7
14OpenAIOpenAI81.6
15OpenAIOpenAI81.6
16AnthropicAnthropic81.4
17Google DeepMindGoogle DeepMind80.4
18
M
moonshotai
79.0
19DeepSeekDeepSeek77.9
20
ZA
z-ai
77.8
21Google DeepMindGoogle DeepMind77.6
22AnthropicAnthropic76.4
23OpenAIOpenAI75.8
24Alibaba QwenAlibaba Qwen73.4
25AnthropicAnthropic73.0
26AnthropicAnthropic73.0
27OpenAIOpenAI72.8
28AnthropicAnthropic72.3
29AnthropicAnthropic69.7
30OpenAIOpenAI69.4
31OpenAIOpenAI69.0
32DeepSeekDeepSeek68.4
33xAIxAI68.3
34AnthropicAnthropic68.3
35xAIxAI68.3
36OpenAIOpenAI67.7
37OpenAIOpenAI67.7
38OpenAIOpenAI67.7
39OpenAIOpenAI67.7
40xAIxAI67.7
41xAIxAI67.7
42OpenAIOpenAI66.7
43Alibaba QwenAlibaba Qwen63.5
44DeepSeekDeepSeek62.3
45AnthropicAnthropic61.6
46Alibaba QwenAlibaba Qwen60.9
47OpenAIOpenAI59.3
48OpenAIOpenAI58.3
49MetaMeta56.0
50OpenAIOpenAI55.9
51OpenAIOpenAI54.5
52GoogleGoogle54.2
53OpenAIOpenAI49.8
54Mistral AIMistral AI46.0
55GoogleGoogle43.0
56GoogleGoogle42.8
57DeepSeekDeepSeek42.0
58AlibabaAlibaba41.5
59MicrosoftMicrosoft41.4
60AnthropicAnthropic38.7
61xAIxAI38.4
62MetaMeta35.8
63Mistral AIMistral AI35.1
64MetaMeta34.5
65OpenAIOpenAI33.8
66OpenAIOpenAI32.3
67OpenAIOpenAI32.3
68OpenAIOpenAI32.3
69OpenAIOpenAI32.3
70Alibaba QwenAlibaba Qwen32.2
71Mistral AIMistral AI32.0
72OpenAIOpenAI31.9
73Google DeepMindGoogle DeepMind31.8
74Google DeepMindGoogle DeepMind31.8
75Google DeepMindGoogle DeepMind31.8
76Google DeepMindGoogle DeepMind31.8
77Google DeepMindGoogle DeepMind31.8
78Google DeepMindGoogle DeepMind31.8
79OpenAIOpenAI30.5
80MetaMeta29.9
81MetaMeta29.9
82GoogleGoogle29.8
83AnthropicAnthropic29.6
84OpenAIOpenAI28.8
85GoogleGoogle27.8
86MetaMeta25.6
87MetaMeta21.4
88AlibabaAlibaba21.0
89AnthropicAnthropic20.8
90MetaMeta20.8
91GoogleGoogle20.5
92Mistral AIMistral AI18.4
93AnthropicAnthropic17.5
94OpenAIOpenAI17.0
95OpenAIOpenAI17.0
96Google DeepMindGoogle DeepMind15.3
97AnthropicAnthropic15.1
98AnthropicAnthropic12.9
99Mistral AIMistral AI12.1
100GoogleGoogle11.9
101AnthropicAnthropic10.6
102OpenAIOpenAI7.5
103OpenAIOpenAI7.5
104OpenAIOpenAI7.5
105OpenAIOpenAI7.5
106Mistral AIMistral AI7.5
107Mistral AIMistral AI6.5
108MicrosoftMicrosoft3.5
109Google DeepMindGoogle DeepMind3.3
110OpenAIOpenAI2.9
111OpenAIOpenAI2.9
112OpenAIOpenAI2.9
113MetaMeta1.8
114MetaMeta1.4
115MetaMeta1.3