Beta

Best AI Models for Reasoning

AI models ranked by reasoning benchmarks. Compare GPQA Diamond, ARC-AGI, BBH, and other reasoning tests across all providers.

133
Models
15
Providers
61
Open Source
$0.66
Median $/1M in
#ModelAvgARC-AGIARC-AGI-2BBHGPQA diamondHLESimpleBench$/1M inContext
1Anthropic logoClaude Mythos Preview🇺🇸 Anthropic81.8---94.556.8-N/A1.0M
2DeepSeek logoDeepSeek-V2 (MoE-236B, May 2024)🇨🇳 DeepSeekOpen76.5--71.7---N/A0K
3Microsoft logophi-3-small 7.4B🇺🇸 MicrosoftOpen67.4--72.1---N/A0K
4OpenAI logoGPT-5.4 Pro🇺🇸 OpenAI66.794.583.3-92.8-68.9$30.001.1M
5OpenAI logoo3 Pro🇺🇸 OpenAI61.259.34.9----$20.00200K
6Microsoft logophi-3-mini 3.8B🇺🇸 MicrosoftOpen61.0--62.3---N/A0K
7Alibaba Qwen logoQwen-14B🇨🇳 Alibaba QwenOpen60.7--40.0---N/A0K
8Google DeepMind logoGemini 3.1 Pro Preview🇺🇸 Google DeepMind60.698.077.1-92.1-75.5$2.001.0M
9Google DeepMind logoGemini 3 Pro🇺🇸 Google DeepMind60.575.031.1-90.234.471.7N/A0K
10DeepSeek logoDeepSeek V3🇨🇳 DeepSeekOpen59.0--83.342.0-2.7$0.32164K
11OpenAI logoGPT-5.4🇺🇸 OpenAI59.093.774.0-91.1--$2.501.1M
12
U
Muse Spark Unknown
59.0---86.4--N/A0K
13Microsoft logophi-3-medium 14B🇺🇸 MicrosoftOpen58.6--75.23.5--N/A0K
14Alibaba Qwen logoQwen3 Max🇨🇳 Alibaba QwenOpen58.3---63.5--$0.78262K
15TII logoFalcon 2 11B TIIOpen58.0------N/A0K
16DeepSeek logoR1 0528🇨🇳 DeepSeekOpen57.921.21.1-68.4-29.0$0.50164K
17Mistral AI logoMixtral 8x7B Instruct🇫🇷 Mistral AIOpen57.8---7.5--$0.5433K
18z-ai logoGLM 5🇨🇳 z-aiOpen57.644.74.9-83.8-43.8$0.7280K
19Anthropic logoClaude Opus 4.6🇺🇸 Anthropic57.594.069.2-87.431.161.1$5.001.0M
20OpenAI logoo1🇺🇸 OpenAI56.430.7--69.0-28.1$15.00200K
21Alibaba Qwen logoQwen3 235B A22B🇨🇳 Alibaba QwenOpen56.4---60.9-17.2$0.46131K
22Google DeepMind logoGemini 2.5 Pro🇺🇸 Google DeepMind56.241.04.9-80.417.754.9$1.251.0M
23OpenAI logoGPT-5.2 Pro🇺🇸 OpenAI56.290.554.2---48.9$21.00400K
24moonshotai logoKimi K2 0711🇨🇳 moonshotaiOpen56.2-----11.6$0.57131K
25OpenAI logoGPT-5 Mini🇺🇸 OpenAI56.054.34.4-66.715.4-$0.25400K
26Alibaba Qwen logoQwen3 235B A22B Thinking 2507🇨🇳 Alibaba QwenOpen55.9---73.4--$0.15131K
27OpenAI logoo3🇺🇸 OpenAI55.260.86.5-75.816.343.7$2.00200K
28minimax logoMiniMax M2.5🇨🇳 minimaxOpen55.163.74.9----$0.12197K
29OpenAI logoGPT-4 (older v0314)🇺🇸 OpenAI55.0---14.3--$30.008K
30xAI logoGrok 4🇺🇸 xAI54.866.716.0-82.7-52.6$3.00256K
31OpenAI logoGPT-5🇺🇸 OpenAI54.465.79.9-81.621.648.0$1.25400K
32OpenAI logoGPT-5.2🇺🇸 OpenAI54.086.252.9-88.524.235.0$1.75400K
33Google DeepMind logoGemini 2.0 Pro🇺🇸 Google DeepMind53.7---54.2--N/A0K
34
U
Nemotron-4 15B Unknown
53.4--44.9---N/A0K
35moonshotai logoKimi K2 Thinking🇨🇳 moonshotaiOpen53.3---79.0--$0.60262K
36OpenAI logoo4 Mini🇺🇸 OpenAI53.258.76.1-72.813.926.4$1.10200K
37Alibaba Qwen logoQwen2.5 72B Instruct🇨🇳 Alibaba QwenOpen53.2--73.132.2--$0.1233K
38Alibaba Qwen logoQwen2.5 Coder 32B Instruct🇨🇳 Alibaba QwenOpen53.1------$0.6633K
39DeepSeek logoDeepSeek V3.2🇨🇳 DeepSeekOpen53.057.04.0-77.9--$0.26164K
40moonshotai logoKimi K2.5🇨🇳 moonshotaiOpen52.065.311.8-83.520.636.2$0.38262K
41DeepSeek logoDeepSeek V3.1🇨🇳 DeepSeekOpen51.1-----28.0$0.1533K
42OpenAI logoGPT-4o (2024-05-13)🇺🇸 OpenAI51.1---31.9--$5.00128K
43OpenAI logoGPT-4 Turbo🇺🇸 OpenAI51.0--66.87.5-10.1$10.00128K
44z-ai logoGLM 4.7🇨🇳 z-aiOpen50.5---77.8-37.2$0.39203K
45xAI logoGrok 4 Fast🇺🇸 xAI50.448.55.3----$0.202.0M
46OpenAI logoGPT-5.1🇺🇸 OpenAI49.672.817.6-83.519.843.8$1.25400K
47Google DeepMind logoGemini 3 Flash Preview🇺🇸 Google DeepMind49.121.533.6-77.6-53.3$0.501.0M
48Alibaba Qwen logoQwen3 235B A22B Instruct 2507🇨🇳 Alibaba QwenOpen48.511.01.3----$0.07262K
49Google DeepMind logoGemini 2.0 Flash🇺🇸 Google DeepMind48.0-1.3-52.2-17.3$0.101.0M
50
U
Stable Beluga 2 Unknown
47.8--59.1---N/A0K
51Anthropic logoClaude 3.7 Sonnet🇺🇸 Anthropic47.728.60.9-73.03.435.7$3.00200K
52Anthropic logoClaude Sonnet 4.6🇺🇸 Anthropic47.686.560.4-83.2--$3.001.0M
53Google DeepMind logoGemini 1.5 Flash (May 2024)🇺🇸 Google DeepMind47.4---20.5--N/A0K
54OpenAI logogpt-oss-120b🇺🇸 OpenAIOpen46.9---67.7-6.5$0.04131K
55xAI logoGrok 3 Mini🇺🇸 xAI46.616.50.4-68.3--$0.30131K
56OpenAI logoGPT-3.5 Turbo (older v0613)🇺🇸 OpenAI45.8--48.82.9--$1.004K
57Mistral AI logoMistral Large 2411🇫🇷 Mistral AIOpen45.8---35.1--$2.00131K
58Anthropic logoClaude Opus 4.5🇺🇸 Anthropic45.480.037.6-81.421.454.4$5.00200K
59OpenAI logoGPT-5 Nano🇺🇸 OpenAI45.320.72.6-59.3--$0.05400K
60DeepSeek logoR1🇨🇳 DeepSeekOpen45.115.81.3-62.3-17.1$0.7064K
61Anthropic logoClaude Sonnet 4🇺🇸 Anthropic44.640.05.9-72.33.134.6$3.001.0M
62OpenAI logoGPT-4.1 Mini🇺🇸 OpenAI44.53.50.1-54.5--$0.401.0M
63TII logoFalcon-180B TIIOpen44.4--16.1---N/A0K
64Alibaba Qwen logoQwen2.5 Coder 7B Instruct🇨🇳 Alibaba QwenOpen44.4------$0.0333K
65OpenAI logoGPT-4.1🇺🇸 OpenAI43.35.50.4-55.90.612.4$2.001.0M
66OpenAI logoGPT-5 Pro🇺🇸 OpenAI43.370.218.3--28.253.9$15.00400K
67OpenAI logoGPT-4o-mini (2024-07-18)🇺🇸 OpenAI43.2-0.1-17.0--$0.15128K
68Microsoft logoPhi 4🇺🇸 MicrosoftOpen43.2---41.4--$0.0716K
69Meta logoLlama 2-13B🇺🇸 MetaOpen42.5--44.31.8--N/A0K
70Anthropic logoClaude 3.5 Sonnet🇺🇸 Anthropic42.3---38.7-13.0N/A0K
71Google DeepMind logoGemma 3 27B🇺🇸 Google DeepMindOpen42.2---31.8--$0.08131K
72Google DeepMind logoGemma 3 27B (free)🇺🇸 Google DeepMindOpen42.2---31.8--Free131K
73Anthropic logoClaude Sonnet 4.5🇺🇸 Anthropic42.163.713.6-76.49.445.2$3.001.0M
74Anthropic logoClaude Opus 4🇺🇸 Anthropic41.735.78.6-68.36.250.6$15.00200K
75Mistral AI logoMistral 7B V0.1🇫🇷 Mistral AIOpen41.6--41.5---N/A0K
76OpenAI logoo1-preview🇺🇸 OpenAI41.518.0--33.8-30.0N/A0K
77Anthropic logoClaude Opus 4.1🇺🇸 Anthropic41.3---69.77.152.0$15.00200K
78Google DeepMind logoGemini 1.5 Pro (Feb 2024)🇺🇸 Google DeepMind41.3-0.878.727.8-12.5N/A0K
79Alibaba Qwen logoQwen2-72B🇨🇳 Alibaba QwenOpen41.3---21.0--N/A0K
80Alibaba Qwen logoQwen2.5-Max🇨🇳 Alibaba QwenOpen41.0---41.5--N/A0K
81
U
Baichuan 2-7B Unknown
40.3--22.1---N/A0K
82Google DeepMind logoGemini 2.5 Flash🇺🇸 Google DeepMind40.032.32.5--7.729.4$0.301.0M
83Mistral AI logoMistral Medium 3🇫🇷 Mistral AIOpen40.0---46.0--$0.40131K
84OpenAI logoGPT-4o-mini🇺🇸 OpenAI39.6-0.1-17.0--$0.15128K
85Mistral AI logoMistral Large 2407🇫🇷 Mistral AIOpen39.1---32.0-7.0$2.00131K
86Alibaba logoQwen2.5 Coder 1.5B Instruct🇨🇳 AlibabaOpen38.8------N/A0K
87xAI logoGrok 3🇺🇸 xAI38.45.50.1-67.7-23.3$3.00131K
88OpenAI logoo3 Mini🇺🇸 OpenAI38.434.53.0-69.4-7.4$1.10200K
89Meta logoLlama 3.1 405B🇺🇸 MetaOpen38.0--77.234.5-7.6N/A0K
90Meta logoLlama 3.1 70B Instruct🇺🇸 MetaOpen37.8---25.6--$0.40131K
91Google DeepMind logoGemini 2.0 Flash Thinking (Jan 2025)🇺🇸 Google DeepMind37.7---42.81.916.8N/A0K
92OpenAI logoGPT-4o (2024-11-20)🇺🇸 OpenAI37.74.50.1-32.3-1.4$2.50128K
93Anthropic logoClaude 2🇺🇸 Anthropic37.2---12.9--N/A0K
94Anthropic logoClaude 3.5 Haiku🇺🇸 Anthropic37.2---17.5--$0.80200K
95Mistral AI logoMistral Nemo🇫🇷 Mistral AIOpen37.2---6.5--$0.02131K
96Anthropic logoClaude Haiku 4.5🇺🇸 Anthropic37.147.74.0-61.6--$1.00200K
97Meta logoLlama 3.2 90B🇺🇸 MetaOpen36.1---21.4--N/A0K
98Google DeepMind logoGemma 2 9B🇺🇸 Google DeepMindOpen36.0---3.3--$0.038K
99OpenAI logoGPT-4.5🇺🇸 OpenAI35.910.30.8-58.30.721.4N/A0K
100OpenAI logoGPT-4o (2024-08-06)🇺🇸 OpenAI35.6---32.3-1.4$2.50128K
101OpenAI logoGPT-4.1 Nano🇺🇸 OpenAI35.20.10.1-31.9--$0.101.0M
102Meta logoLLaMA-13B🇺🇸 MetaOpen34.9--17.2---N/A0K
103OpenAI logoo1-mini🇺🇸 OpenAI34.914.00.8-49.8-1.7N/A0K
104
U
XGen-7B Unknown
33.9------N/A0K
105Anthropic logoClaude 3 Opus🇺🇸 Anthropic33.7---29.6-8.2N/A0K
106xAI logoGrok-2 (Dec 2024)🇺🇸 xAI33.2---38.4-7.2N/A0K
107Google DeepMind logoGemma 2 27B🇺🇸 Google DeepMindOpen32.9---15.3--$0.658K
108Meta logoLlama 3 70B Instruct🇺🇸 MetaOpen32.4---20.8--$0.518K
109
U
MPT-30B Unknown
31.7--17.3---N/A0K
110
U
Yi 6B UnknownOpen
31.4--29.6---N/A0K
111Meta logoLlama 3 8B Instruct🇺🇸 MetaOpen30.8---1.4--$0.038K
112Microsoft logoPhi 2🇺🇸 MicrosoftOpen30.2--45.9---N/A0K
113Mistral AI logoMistral Large🇫🇷 Mistral AIOpen30.0---18.4-7.0$2.00128K
114
U
Dolly 2.0-12b Unknown
29.2------N/A0K
115Google DeepMind logoGemma 2B🇺🇸 Google DeepMindOpen29.1--13.6---N/A0K
116Meta logoLlama 3.3 70B Instruct (free)🇺🇸 MetaOpen29.1---29.9-3.9Free66K
117Anthropic logoClaude 3 Haiku🇺🇸 Anthropic28.7---15.1--$0.25200K
118Anthropic logoClaude 3 Sonnet🇺🇸 Anthropic28.3---20.8--N/A0K
119Meta logoLlama 4 Maverick🇺🇸 MetaOpen28.04.40.1-56.00.913.2$0.151.0M
120Meta logoLlama 3.1 8B Instruct🇺🇸 MetaOpen27.4---1.3--$0.0216K
121DeepSeek logoDeepSeek Coder 33B🇨🇳 DeepSeekOpen25.4------N/A0K
122
U
StarCoder 2 15B UnknownOpen
24.3------N/A0K
123
U
Baichuan1-7B Unknown
23.7--10.0---N/A0K
124Mistral AI logoMixtral 8x22B Instruct🇫🇷 Mistral AIOpen23.5---12.1--$2.0066K
125OpenAI logoCerebras-GPT-13B🇺🇸 OpenAI23.4------N/A0K
126Google DeepMind logoGemini 1.0 Pro🇺🇸 Google DeepMind21.1---11.9--N/A0K
127Anthropic logoClaude 2.1🇺🇸 Anthropic21.0---10.6--N/A0K
128
U
INTELLECT-1 Unknown
20.2--13.1---N/A0K
129Meta logoLlama 4 Scout🇺🇸 MetaOpen18.90.50.1-35.8--$0.08328K
130DeepSeek logoDeepSeek Coder 6.7B🇨🇳 DeepSeekOpen16.7------N/A0K
131
U
Magistral Small 1.1 Unknown
16.65.00.1-31.2--N/A0K
132Microsoft logoPhi-1.5🇺🇸 MicrosoftOpen16.3------N/A0K
133DeepSeek logoDeepSeek Coder 1.3B🇨🇳 DeepSeekOpen3.2------N/A0K
90+ Gold 80-89 70-79 60-69 <60Scores in % unless noted. Avg = unweighted mean across tested benchmarks.

Models ranked by reasoning performance across GPQA Diamond, ARC-AGI, BBH, and other logic and reasoning benchmarks. These tests measure abstract thinking, pattern recognition, and multi-step inference.

Which AI model has the best reasoning?

Reasoning rankings shift as new models launch. The live leaderboard above shows current standings based on GPQA Diamond, ARC-AGI, and other reasoning benchmarks.

What is GPQA Diamond?

GPQA Diamond is a graduate-level question answering benchmark designed to test deep reasoning. Questions are written by domain experts and verified to be answerable only with genuine understanding.

How does ARC-AGI measure reasoning?

ARC-AGI tests abstract reasoning through visual pattern puzzles that require novel inference, not memorization. It is considered one of the hardest AI benchmarks.