Best AI Models for Reasoning

AI models ranked by reasoning benchmarks. Compare GPQA Diamond, ARC-AGI, BBH, and other reasoning tests across all providers.

135
Models
15
Providers
61
Open Source
$0.66
Median $/1M in
#ModelAvgARC-AGIARC-AGI-2BBHGPQA diamondHLESimpleBench$/1M inContext
1OpenAI logoGPT-5.5 Pro🇺🇸 OpenAI87.8---94.2--$30.00400K
2OpenAI logoGPT-5.5🇺🇸 OpenAI85.095.0--93.6--$5.00400K
3Anthropic logoClaude Mythos Preview🇺🇸 Anthropic81.8---94.556.8-N/A1.0M
4DeepSeek logoDeepSeek-V2 (MoE-236B, May 2024)🇨🇳 DeepSeekOpen76.5--71.7---N/A0K
5Microsoft logophi-3-small 7.4B🇺🇸 MicrosoftOpen67.4--72.1---N/A0K
6OpenAI logoGPT-5.4 Pro🇺🇸 OpenAI66.794.583.3-92.8-68.9$30.001.1M
7OpenAI logoo3 Pro🇺🇸 OpenAI61.259.34.9----$20.00200K
8Microsoft logophi-3-mini 3.8B🇺🇸 MicrosoftOpen61.0--62.3---N/A0K
9Alibaba Qwen logoQwen-14B🇨🇳 Alibaba QwenOpen60.7--40.0---N/A0K
10Google DeepMind logoGemini 3.1 Pro Preview🇺🇸 Google DeepMind60.698.077.1-92.1-75.5$2.001.0M
11Google DeepMind logoGemini 3 Pro🇺🇸 Google DeepMind60.575.031.1-90.234.471.7N/A0K
12DeepSeek logoDeepSeek V3🇨🇳 DeepSeekOpen59.0--83.342.0-2.7$0.32164K
13OpenAI logoGPT-5.4🇺🇸 OpenAI59.093.774.0-91.1--$2.501.1M
14
U
Muse Spark Unknown
59.0---86.4--N/A0K
15Microsoft logophi-3-medium 14B🇺🇸 MicrosoftOpen58.6--75.23.5--N/A0K
16Alibaba Qwen logoQwen3 Max🇨🇳 Alibaba QwenOpen58.3---63.5--$0.78262K
17TII logoFalcon 2 11B TIIOpen58.0------N/A0K
18DeepSeek logoR1 0528🇨🇳 DeepSeekOpen57.921.21.1-68.4-29.0$0.50164K
19Mistral AI logoMixtral 8x7B Instruct🇫🇷 Mistral AIOpen57.8---7.5--$0.5433K
20z-ai logoGLM 5🇨🇳 z-aiOpen57.644.74.9-83.8-43.8$0.60203K
21Anthropic logoClaude Opus 4.6🇺🇸 Anthropic57.594.069.2-87.431.161.1$5.001.0M
22OpenAI logoo1🇺🇸 OpenAI56.430.7--69.0-28.1$15.00200K
23Alibaba Qwen logoQwen3 235B A22B🇨🇳 Alibaba QwenOpen56.4---60.9-17.2$0.46131K
24Google DeepMind logoGemini 2.5 Pro🇺🇸 Google DeepMind56.241.04.9-80.417.754.9$1.251.0M
25OpenAI logoGPT-5.2 Pro🇺🇸 OpenAI56.290.554.2---48.9$21.00400K
26moonshotai logoKimi K2 0711🇨🇳 moonshotaiOpen56.2-----11.6$0.57131K
27OpenAI logoGPT-5 Mini🇺🇸 OpenAI56.054.34.4-66.715.4-$0.25400K
28Alibaba Qwen logoQwen3 235B A22B Thinking 2507🇨🇳 Alibaba QwenOpen55.9---73.4--$0.15131K
29OpenAI logoo3🇺🇸 OpenAI55.260.86.5-75.816.343.7$2.00200K
30minimax logoMiniMax M2.5🇨🇳 minimaxOpen55.163.74.9----$0.15197K
31OpenAI logoGPT-4 (older v0314)🇺🇸 OpenAI55.0---14.3--$30.008K
32xAI logoGrok 4🇺🇸 xAI54.866.716.0-82.7-52.6$3.00256K
33OpenAI logoGPT-5🇺🇸 OpenAI54.465.79.9-81.621.648.0$1.25400K
34OpenAI logoGPT-5.2🇺🇸 OpenAI54.086.252.9-88.524.235.0$1.75400K
35Google DeepMind logoGemini 2.0 Pro🇺🇸 Google DeepMind53.7---54.2--N/A0K
36
U
Nemotron-4 15B Unknown
53.4--44.9---N/A0K
37moonshotai logoKimi K2 Thinking🇨🇳 moonshotaiOpen53.3---79.0--$0.60262K
38OpenAI logoo4 Mini🇺🇸 OpenAI53.258.76.1-72.813.926.4$1.10200K
39Alibaba Qwen logoQwen2.5 72B Instruct🇨🇳 Alibaba QwenOpen53.2--73.132.2--$0.3633K
40Alibaba Qwen logoQwen2.5 Coder 32B Instruct🇨🇳 Alibaba QwenOpen53.1------$0.6633K
41DeepSeek logoDeepSeek V3.2🇨🇳 DeepSeekOpen53.057.04.0-77.9--$0.25131K
42moonshotai logoKimi K2.5🇨🇳 moonshotaiOpen52.065.311.8-83.520.636.2$0.44262K
43DeepSeek logoDeepSeek V3.1🇨🇳 DeepSeekOpen51.1-----28.0$0.1533K
44OpenAI logoGPT-4o (2024-05-13)🇺🇸 OpenAI51.1---31.9--$5.00128K
45OpenAI logoGPT-4 Turbo🇺🇸 OpenAI51.0--66.87.5-10.1$10.00128K
46z-ai logoGLM 4.7🇨🇳 z-aiOpen50.5---77.8-37.2$0.38203K
47xAI logoGrok 4 Fast🇺🇸 xAI50.448.55.3----$0.202.0M
48OpenAI logoGPT-5.1🇺🇸 OpenAI49.672.817.6-83.519.843.8$1.25400K
49Google DeepMind logoGemini 3 Flash Preview🇺🇸 Google DeepMind49.121.533.6-77.6-53.3$0.501.0M
50Alibaba Qwen logoQwen3 235B A22B Instruct 2507🇨🇳 Alibaba QwenOpen48.511.01.3----$0.07262K
51Google DeepMind logoGemini 2.0 Flash🇺🇸 Google DeepMind48.0-1.3-52.2-17.3$0.101.0M
52
U
Stable Beluga 2 Unknown
47.8--59.1---N/A0K
53Anthropic logoClaude 3.7 Sonnet🇺🇸 Anthropic47.728.60.9-73.03.435.7$3.00200K
54Anthropic logoClaude Sonnet 4.6🇺🇸 Anthropic47.686.560.4-83.2--$3.001.0M
55Google DeepMind logoGemini 1.5 Flash (May 2024)🇺🇸 Google DeepMind47.4---20.5--N/A0K
56OpenAI logogpt-oss-120b🇺🇸 OpenAIOpen46.9---67.7-6.5$0.04131K
57xAI logoGrok 3 Mini🇺🇸 xAI46.616.50.4-68.3--$0.30131K
58OpenAI logoGPT-3.5 Turbo (older v0613)🇺🇸 OpenAI45.8--48.82.9--$1.004K
59Mistral AI logoMistral Large 2411🇫🇷 Mistral AIOpen45.8---35.1--$2.00131K
60Anthropic logoClaude Opus 4.5🇺🇸 Anthropic45.480.037.6-81.421.454.4$5.00200K
61OpenAI logoGPT-5 Nano🇺🇸 OpenAI45.320.72.6-59.3--$0.05400K
62DeepSeek logoR1🇨🇳 DeepSeekOpen45.115.81.3-62.3-17.1$0.7064K
63Anthropic logoClaude Sonnet 4🇺🇸 Anthropic44.640.05.9-72.33.134.6$3.001.0M
64OpenAI logoGPT-4.1 Mini🇺🇸 OpenAI44.53.50.1-54.5--$0.401.0M
65TII logoFalcon-180B TIIOpen44.4--16.1---N/A0K
66Alibaba Qwen logoQwen2.5 Coder 7B Instruct🇨🇳 Alibaba QwenOpen44.4------$0.0333K
67OpenAI logoGPT-4.1🇺🇸 OpenAI43.35.50.4-55.90.612.4$2.001.0M
68OpenAI logoGPT-5 Pro🇺🇸 OpenAI43.370.218.3--28.253.9$15.00400K
69OpenAI logoGPT-4o-mini (2024-07-18)🇺🇸 OpenAI43.2-0.1-17.0--$0.15128K
70Microsoft logoPhi 4🇺🇸 MicrosoftOpen43.2---41.4--$0.0716K
71Meta logoLlama 2-13B🇺🇸 MetaOpen42.5--44.31.8--N/A0K
72Anthropic logoClaude 3.5 Sonnet🇺🇸 Anthropic42.3---38.7-13.0N/A0K
73Google DeepMind logoGemma 3 27B🇺🇸 Google DeepMindOpen42.2---31.8--$0.08131K
74Google DeepMind logoGemma 3 27B (free)🇺🇸 Google DeepMindOpen42.2---31.8--Free131K
75Anthropic logoClaude Sonnet 4.5🇺🇸 Anthropic42.163.713.6-76.49.445.2$3.001.0M
76Anthropic logoClaude Opus 4🇺🇸 Anthropic41.735.78.6-68.36.250.6$15.00200K
77Mistral AI logoMistral 7B V0.1🇫🇷 Mistral AIOpen41.6--41.5---N/A0K
78OpenAI logoo1-preview🇺🇸 OpenAI41.518.0--33.8-30.0N/A0K
79Anthropic logoClaude Opus 4.1🇺🇸 Anthropic41.3---69.77.152.0$15.00200K
80Google DeepMind logoGemini 1.5 Pro (Feb 2024)🇺🇸 Google DeepMind41.3-0.878.727.8-12.5N/A0K
81Alibaba Qwen logoQwen2-72B🇨🇳 Alibaba QwenOpen41.3---21.0--N/A0K
82Alibaba Qwen logoQwen2.5-Max🇨🇳 Alibaba QwenOpen41.0---41.5--N/A0K
83
U
Baichuan 2-7B Unknown
40.3--22.1---N/A0K
84Google DeepMind logoGemini 2.5 Flash🇺🇸 Google DeepMind40.032.32.5--7.729.4$0.301.0M
85Mistral AI logoMistral Medium 3🇫🇷 Mistral AIOpen40.0---46.0--$0.40131K
86OpenAI logoGPT-4o-mini🇺🇸 OpenAI39.6-0.1-17.0--$0.15128K
87Mistral AI logoMistral Large 2407🇫🇷 Mistral AIOpen39.1---32.0-7.0$2.00131K
88Alibaba logoQwen2.5 Coder 1.5B Instruct🇨🇳 AlibabaOpen38.8------N/A0K
89xAI logoGrok 3🇺🇸 xAI38.45.50.1-67.7-23.3$3.00131K
90OpenAI logoo3 Mini🇺🇸 OpenAI38.434.53.0-69.4-7.4$1.10200K
91Meta logoLlama 3.1 405B🇺🇸 MetaOpen38.0--77.234.5-7.6N/A0K
92Meta logoLlama 3.1 70B Instruct🇺🇸 MetaOpen37.8---25.6--$0.40131K
93Google DeepMind logoGemini 2.0 Flash Thinking (Jan 2025)🇺🇸 Google DeepMind37.7---42.81.916.8N/A0K
94OpenAI logoGPT-4o (2024-11-20)🇺🇸 OpenAI37.74.50.1-32.3-1.4$2.50128K
95Anthropic logoClaude 2🇺🇸 Anthropic37.2---12.9--N/A0K
96Anthropic logoClaude 3.5 Haiku🇺🇸 Anthropic37.2---17.5--$0.80200K
97Mistral AI logoMistral Nemo🇫🇷 Mistral AIOpen37.2---6.5--$0.02131K
98Anthropic logoClaude Haiku 4.5🇺🇸 Anthropic37.147.74.0-61.6--$1.00200K
99Meta logoLlama 3.2 90B🇺🇸 MetaOpen36.1---21.4--N/A0K
100Google DeepMind logoGemma 2 9B🇺🇸 Google DeepMindOpen36.0---3.3--$0.038K
101OpenAI logoGPT-4.5🇺🇸 OpenAI35.910.30.8-58.30.721.4N/A0K
102OpenAI logoGPT-4o (2024-08-06)🇺🇸 OpenAI35.6---32.3-1.4$2.50128K
103OpenAI logoGPT-4.1 Nano🇺🇸 OpenAI35.20.10.1-31.9--$0.101.0M
104Meta logoLLaMA-13B🇺🇸 MetaOpen34.9--17.2---N/A0K
105OpenAI logoo1-mini🇺🇸 OpenAI34.914.00.8-49.8-1.7N/A0K
106
U
XGen-7B Unknown
33.9------N/A0K
107Anthropic logoClaude 3 Opus🇺🇸 Anthropic33.7---29.6-8.2N/A0K
108xAI logoGrok-2 (Dec 2024)🇺🇸 xAI33.2---38.4-7.2N/A0K
109Google DeepMind logoGemma 2 27B🇺🇸 Google DeepMindOpen32.9---15.3--$0.658K
110Meta logoLlama 3 70B Instruct🇺🇸 MetaOpen32.4---20.8--$0.518K
111
U
MPT-30B Unknown
31.7--17.3---N/A0K
112
U
Yi 6B UnknownOpen
31.4--29.6---N/A0K
113Meta logoLlama 3 8B Instruct🇺🇸 MetaOpen30.8---1.4--$0.038K
114Microsoft logoPhi 2🇺🇸 MicrosoftOpen30.2--45.9---N/A0K
115Mistral AI logoMistral Large🇫🇷 Mistral AIOpen30.0---18.4-7.0$2.00128K
116
U
Dolly 2.0-12b Unknown
29.2------N/A0K
117Google DeepMind logoGemma 2B🇺🇸 Google DeepMindOpen29.1--13.6---N/A0K
118Meta logoLlama 3.3 70B Instruct (free)🇺🇸 MetaOpen29.1---29.9-3.9Free66K
119Anthropic logoClaude 3 Haiku🇺🇸 Anthropic28.7---15.1--$0.25200K
120Anthropic logoClaude 3 Sonnet🇺🇸 Anthropic28.3---20.8--N/A0K
121Meta logoLlama 4 Maverick🇺🇸 MetaOpen28.04.40.1-56.00.913.2$0.151.0M
122Meta logoLlama 3.1 8B Instruct🇺🇸 MetaOpen27.4---1.3--$0.0216K
123DeepSeek logoDeepSeek Coder 33B🇨🇳 DeepSeekOpen25.4------N/A0K
124
U
StarCoder 2 15B UnknownOpen
24.3------N/A0K
125
U
Baichuan1-7B Unknown
23.7--10.0---N/A0K
126Mistral AI logoMixtral 8x22B Instruct🇫🇷 Mistral AIOpen23.5---12.1--$2.0066K
127OpenAI logoCerebras-GPT-13B🇺🇸 OpenAI23.4------N/A0K
128Google DeepMind logoGemini 1.0 Pro🇺🇸 Google DeepMind21.1---11.9--N/A0K
129Anthropic logoClaude 2.1🇺🇸 Anthropic21.0---10.6--N/A0K
130
U
INTELLECT-1 Unknown
20.2--13.1---N/A0K
131Meta logoLlama 4 Scout🇺🇸 MetaOpen18.90.50.1-35.8--$0.08328K
132DeepSeek logoDeepSeek Coder 6.7B🇨🇳 DeepSeekOpen16.7------N/A0K
133
U
Magistral Small 1.1 Unknown
16.65.00.1-31.2--N/A0K
134Microsoft logoPhi-1.5🇺🇸 MicrosoftOpen16.3------N/A0K
135DeepSeek logoDeepSeek Coder 1.3B🇨🇳 DeepSeekOpen3.2------N/A0K
90+ Gold 80-89 70-79 60-69 <60Scores in % unless noted. Avg = unweighted mean across tested benchmarks.

Models ranked by reasoning performance across GPQA Diamond, ARC-AGI, BBH, and other logic and reasoning benchmarks. These tests measure abstract thinking, pattern recognition, and multi-step inference.

Which AI model has the best reasoning?

Reasoning rankings shift as new models launch. The live leaderboard above shows current standings based on GPQA Diamond, ARC-AGI, and other reasoning benchmarks.

What is GPQA Diamond?

GPQA Diamond is a graduate-level question answering benchmark designed to test deep reasoning. Questions are written by domain experts and verified to be answerable only with genuine understanding.

How does ARC-AGI measure reasoning?

ARC-AGI tests abstract reasoning through visual pattern puzzles that require novel inference, not memorization. It is considered one of the hardest AI benchmarks.