Benchmark · MathSettled

OTIS Mock AIME 2024-2025

OTIS Mock AIME 2024-2025 · simulated American Invitational Mathematics Examination problems testing advanced problem-solving skills.

Updated 2026-03-05
Models tested
86
Top score
96.1
GPT-5.2
Median
34.9
min 0.5
Top-5 spread
σ 1.2
Settled

Best score over time · one chart, every benchmark

OTIS MOCK AIME 2024-202561 MODELS · FRONTIER RUNNING MAX0255075100SCORE ↑Jun 24Nov 24May 25Oct 25Mar 26RELEASE DATE →benchgecko.ai/benchmark/otis-mock-aime-2024-2025 · frontier
Frontier on OTIS Mock AIME 2024-2025 rose from 9.6 to 96.1 in 17 months · +86.5 points · latest leader GPT-5.2 from OpenAI.
Pink dots = frontier records · 9 totalClick to open model page

86 models tested · sorted by score

#ModelScore
1OpenAI logoGPT-5.296.1
2Google DeepMind logoGemini 3.1 Pro Preview95.6
3OpenAI logoGPT-5.495.3
4Anthropic logoClaude Opus 4.694.4
5Google DeepMind logoGemini 3 Flash Preview92.8
6moonshotai logoKimi K2.592.2
7Google DeepMind logoGemini 3 Pro91.4
8OpenAI logoGPT-591.4
9
U
Muse Spark
88.9
10OpenAI logogpt-oss-120b88.9
11OpenAI logoGPT-5.188.6
12DeepSeek logoDeepSeek V3.287.8
13OpenAI logoGPT-5 Mini86.7
14Alibaba Qwen logoQwen3 235B A22B Thinking 250786.7
15Anthropic logoClaude Opus 4.586.1
16Anthropic logoClaude Sonnet 4.685.8
17Google DeepMind logoGemini 2.5 Pro84.7
18xAI logoGrok 484.0
19OpenAI logoo383.9
20z-ai logoGLM 4.783.3
21moonshotai logoKimi K2 Thinking83.0
22OpenAI logoo4 Mini81.7
23OpenAI logoGPT-5 Nano81.1
24z-ai logoGLM 580.0
25Anthropic logoClaude Sonnet 4.577.8
26xAI logoGrok 3 Mini77.8
27OpenAI logoo3 Mini76.9
28OpenAI logoo173.3
29Alibaba Qwen logoQwen3 Max73.3
30Google DeepMind logoGemini 2.5 Flash73.0
31Anthropic logoClaude Sonnet 471.1
32Anthropic logoClaude Opus 4.168.9
33Anthropic logoClaude Haiku 4.566.6
34DeepSeek logoR1 052866.4
35Anthropic logoClaude Opus 464.4
36Anthropic logoClaude 3.7 Sonnet57.7
37Google DeepMind logoGemini 2.0 Flash Thinking (Jan 2025)57.7
38xAI logoGrok 355.5
39DeepSeek logoR153.3
40OpenAI logoo1-mini46.9
41OpenAI logoGPT-4.1 Mini44.7
42OpenAI logoGPT-4.138.3
43OpenAI logoGPT-4.537.7
44Mistral AI logoMistral Medium 332.1
45Google DeepMind logoGemini 2.0 Flash31.0
46OpenAI logoo1-preview31.0
47
U
Magistral Small 1.1
29.9
48OpenAI logoGPT-4.1 Nano28.8
49Meta logoLlama 4 Maverick20.5
50Google DeepMind logoGemma 3 27B19.6
51Google DeepMind logoGemma 3 27B (free)19.6
52Alibaba Qwen logoQwen2.5-Max16.0
53DeepSeek logoDeepSeek V315.8
54Microsoft logoPhi 413.7
55xAI logoGrok-2 (Dec 2024)11.4
56Meta logoLlama 3.1 405B9.6
57Mistral AI logoMistral Large 24078.4
58Alibaba Qwen logoQwen2.5 72B Instruct8.0
59Meta logoLlama 4 Scout7.7
60Mistral AI logoMistral Large 24117.7
61OpenAI logoGPT-4o-mini6.8
62OpenAI logoGPT-4o-mini (2024-07-18)6.8
63Google DeepMind logoGemini 1.5 Pro (Feb 2024)6.7
64Anthropic logoClaude 3.5 Sonnet6.4
65OpenAI logoGPT-4o (2024-08-06)6.3
66OpenAI logoGPT-4o (2024-11-20)6.3
67OpenAI logoGPT-4o (2024-05-13)6.2
68Meta logoLlama 3.3 70B Instruct (free)5.0
69Anthropic logoClaude 3 Opus4.6
70Anthropic logoClaude 3.5 Haiku4.2
71Meta logoLlama 3 70B Instruct4.2
72Google DeepMind logoGemini 1.5 Flash (May 2024)3.8
73Meta logoLlama 3.1 70B Instruct3.5
74Meta logoLlama 3.2 90B2.5
75Anthropic logoClaude 22.4
76Anthropic logoClaude 3 Sonnet2.4
77Meta logoLlama 3.1 8B Instruct2.4
78Anthropic logoClaude 2.11.9
79Mistral AI logoMistral Large1.9
80Anthropic logoClaude 3 Haiku1.7
81Google DeepMind logoGemma 2 27B1.3
82Google DeepMind logoGemini 1.0 Pro1.0
83OpenAI logoGPT-4 Turbo1.0
84Meta logoLlama 3 8B Instruct0.7
85Google DeepMind logoGemma 2 9B0.5
86OpenAI logoGPT-4 (older v0314)0.5

Same category · related evaluations