API

MMLU

Massive Multitask Language Understanding β€” 57 subjects spanning STEM, humanities, social sciences, and more. The standard benchmark for broad knowledge.

92
Models Tested
84.1
Top Score
55.4
Average Score
1OpenAIOpenAI84.1
2DeepSeekDeepSeek82.9
3GoogleGoogle82.5
4AnthropicAnthropic82.0
5MetaMeta81.7
6MetaMeta81.7
7Alibaba QwenAlibaba Qwen80.4
8MicrosoftMicrosoft79.7
9AnthropicAnthropic79.5
10MetaMeta79.3
11OpenAIOpenAI79.1
12OpenAIOpenAI79.1
13OpenAIOpenAI79.1
14OpenAIOpenAI79.1
15GoogleGoogle76.9
16OpenAIOpenAI76.5
17OpenAIOpenAI76.5
18OpenAIOpenAI76.5
19OpenAIOpenAI76.5
20AlibabaAlibaba76.5
21OpenAIOpenAI75.7
22OpenAIOpenAI75.7
23OpenAIOpenAI75.1
24MetaMeta73.7
25MetaMeta73.5
26Mistral AIMistral AI73.3
27Google DeepMindGoogle DeepMind72.9
28MetaMeta72.4
29Alibaba QwenAlibaba Qwen72.1
30AnthropicAnthropic71.3
31DeepSeekDeepSeek71.2
32MicrosoftMicrosoft70.7
33GoogleGoogle70.5
34Mistral AIMistral AI70.4
35
U
unknown
68.4
36AnthropicAnthropic67.9
37Google DeepMindGoogle DeepMind67.6
38MicrosoftMicrosoft67.6
39AnthropicAnthropic65.7
40GoogleGoogle65.2
41AnthropicAnthropic65.1
42AnthropicAnthropic64.7
43AnthropicAnthropic64.5
44Google DeepMindGoogle DeepMind62.8
45Mistral AIMistral AI60.8
46
T
TII
60.8
47GoogleGoogle60.0
48MetaMeta59.9
49MetaMeta58.4
50Mistral AIMistral AI58.4
51MicrosoftMicrosoft58.4
52
U
unknown
58.1
53Alibaba QwenAlibaba Qwen57.3
54OpenAIOpenAI56.4
55OpenAIOpenAI56.4
56OpenAIOpenAI56.4
57AlibabaAlibaba55.1
58GoogleGoogle54.8
59
U
unknown
52.1
60
U
unknown
52.0
61MetaMeta51.2
62MetaMeta50.1
63MistralMistral50.0
64
U
unknown
45.6
65MetaMeta44.9
66
U
unknown
44.9
67MicrosoftMicrosoft44.5
68
T
TII
44.5
69
T
TII
42.5
70MetaMeta41.5
71MetaMeta40.8
72
U
unknown
38.9
73AlibabaAlibaba38.1
74
U
unknown
33.2
75
U
unknown
30.5
76MetaMeta30.3
77MetaMeta27.7
78AlibabaAlibaba26.7
79
U
unknown
23.1
80GoogleGoogle23.1
81DeepSeekDeepSeek19.2
82
U
unknown
18.4
83MicrosoftMicrosoft16.8
84
U
unknown
15.5
85DeepSeekDeepSeek15.2
86
U
unknown
15.1
87MetaMeta14.1
88
T
TII
13.3
89
U
unknown
7.7
90
U
unknown
1.6
91OpenAIOpenAI1.6
92DeepSeekDeepSeek1.1