API
Benchmarks/FrontierMath-2025-02-28-Private

FrontierMath-2025-02-28-Private

FrontierMath (Feb 2025) β€” original research-level math problems created by mathematicians, testing capabilities at the boundary of current AI mathematical reasoning.

60
Models Tested
50.0
Top Score
14.1
Average Score
1OpenAIOpenAI50.0
2OpenAIOpenAI47.6
3AnthropicAnthropic40.7
4OpenAIOpenAI40.7
5OpenAIOpenAI40.7
6GoogleGoogle37.6
7Google DeepMindGoogle DeepMind36.9
8Google DeepMindGoogle DeepMind35.6
9OpenAIOpenAI32.4
10OpenAIOpenAI32.4
11AnthropicAnthropic32.4
12OpenAIOpenAI31.0
13OpenAIOpenAI31.0
14
M
moonshotai
27.9
15OpenAIOpenAI27.2
16OpenAIOpenAI24.8
17DeepSeekDeepSeek22.1
18
M
moonshotai
21.4
19AnthropicAnthropic20.7
20xAIxAI19.7
21OpenAIOpenAI18.7
22
ZA
z-ai
16.4
23
ZA
z-ai
16.4
24AnthropicAnthropic15.2
25Google DeepMindGoogle DeepMind14.1
26OpenAIOpenAI12.4
27OpenAIOpenAI9.3
28Alibaba QwenAlibaba Qwen8.5
29OpenAIOpenAI8.3
30AnthropicAnthropic7.2
31AnthropicAnthropic5.9
32xAIxAI5.9
33xAIxAI5.9
34OpenAIOpenAI5.5
35AnthropicAnthropic4.5
36OpenAIOpenAI4.5
37AnthropicAnthropic4.1
38AnthropicAnthropic4.1
39AnthropicAnthropic4.1
40
ZA
z-ai
3.8
41xAIxAI3.8
42xAIxAI3.8
43
ZA
z-ai
2.4
44DeepSeekDeepSeek1.7
45OpenAIOpenAI1.7
46OpenAIOpenAI1.0
47AnthropicAnthropic1.0
48AlibabaAlibaba1.0
49MetaMeta0.7
50xAIxAI0.7
51Mistral AIMistral AI0.3
52OpenAIOpenAI0.3
53Mistral AIMistral AI0.3
54AnthropicAnthropic0.3
55OpenAIOpenAI0.3
56OpenAIOpenAI0.3
57OpenAIOpenAI0.3
58OpenAIOpenAI0.3
59MetaMeta0.1
60GoogleGoogle0.1