Best AI Models for Coding

AI models ranked by coding benchmarks. Compare HumanEval+, SWE-bench Verified, Aider Polyglot, and more across all providers.

85
Models
11
Providers
28
Open Source
$0.70
Median $/1M in
#ModelAvgaider polyGSO-Benchswe bench swe bench terminal b$/1M inContext
1OpenAI logoGPT-5.5🇺🇸 OpenAI85.0----82.7$5.00400K
2OpenAI logoGPT-5 Chat🇺🇸 OpenAI81.988.0----$1.25128K
3Anthropic logoClaude Mythos Preview🇺🇸 Anthropic81.8--93.9-82.0N/A1.0M
4Google DeepMind logoGemini 2.5 Pro Preview 05-06🇺🇸 Google DeepMind76.976.9----$1.251.0M
5OpenAI logoo4 Mini High🇺🇸 OpenAI72.072.0----$1.10200K
6xAI logoGrok 3 Beta🇺🇸 xAI69.553.3----$3.00131K
7OpenAI logogpt-oss-120b (free)🇺🇸 OpenAIOpen68.741.8----Free131K
8xAI logoGrok 3 Mini Beta🇺🇸 xAI64.849.3----$0.30131K
9OpenAI logoo3 Pro🇺🇸 OpenAI61.284.9----$20.00200K
10Google DeepMind logoGemini 3.1 Pro Preview🇺🇸 Google DeepMind60.6--75.6-78.4$2.001.0M
11Google DeepMind logoGemini 3 Pro🇺🇸 Google DeepMind60.5-18.672.9-69.4N/A0K
12OpenAI logoo3 Mini High🇺🇸 OpenAI60.460.4----$1.10200K
13DeepSeek logoDeepSeek V3🇨🇳 DeepSeekOpen59.048.4----$0.32164K
14OpenAI logoGPT-5.4🇺🇸 OpenAI59.0--76.9--$2.501.1M
15Alibaba Qwen logoQwen3 32B🇨🇳 Alibaba QwenOpen58.240.0----$0.0841K
16DeepSeek logoR1 0528🇨🇳 DeepSeekOpen57.971.4----$0.50164K
17z-ai logoGLM 5🇨🇳 z-aiOpen57.6--72.1-52.4$0.60203K
18Anthropic logoClaude Opus 4.6🇺🇸 Anthropic57.5-33.378.7-74.7$5.001.0M
19OpenAI logoo1🇺🇸 OpenAI56.461.7----$15.00200K
20Alibaba Qwen logoQwen3 235B A22B🇨🇳 Alibaba QwenOpen56.459.6----$0.46131K
21Google DeepMind logoGemini 2.5 Pro🇺🇸 Google DeepMind56.283.13.957.6-32.6$1.251.0M
22moonshotai logoKimi K2 0711🇨🇳 moonshotaiOpen56.259.14.9--27.8$0.57131K
23OpenAI logoGPT-5 Mini🇺🇸 OpenAI56.0--64.759.834.8$0.25400K
24OpenAI logoo3🇺🇸 OpenAI55.281.38.862.358.4-$2.00200K
25DeepSeek logoDeepSeek V3 0324🇨🇳 DeepSeekOpen55.155.1----$0.20164K
26minimax logoMiniMax M2.5🇨🇳 minimaxOpen55.1----42.2$0.15197K
27xAI logoGrok 4🇺🇸 xAI54.879.6---27.2$3.00256K
28OpenAI logoGPT-5🇺🇸 OpenAI54.488.06.973.565.049.6$1.25400K
29OpenAI logoGPT-5.2🇺🇸 OpenAI54.0-27.473.871.864.9$1.75400K
30Google DeepMind logoGemini 2.0 Pro🇺🇸 Google DeepMind53.735.6----N/A0K
31moonshotai logoKimi K2 Thinking🇨🇳 moonshotaiOpen53.3---63.435.7$0.60262K
32DeepSeek logoDeepSeek V3.2 Exp🇨🇳 DeepSeekOpen53.274.2----$0.27164K
33OpenAI logoo4 Mini🇺🇸 OpenAI53.272.03.6-45.0-$1.10200K
34Alibaba Qwen logoQwen2.5 Coder 32B Instruct🇨🇳 Alibaba QwenOpen53.116.4----$0.6633K
35DeepSeek logoDeepSeek V3.2🇨🇳 DeepSeekOpen53.074.2---39.6$0.25131K
36OpenAI logoGPT-5.3-Codex🇺🇸 OpenAI52.2--74.8-77.3$1.75400K
37moonshotai logoKimi K2.5🇨🇳 moonshotaiOpen52.0--73.8-43.2$0.44262K
38Google DeepMind logoGemini 2.5 Pro Preview 06-05🇺🇸 Google DeepMind50.983.1----$1.251.0M
39z-ai logoGLM 4.6🇨🇳 z-aiOpen50.8----24.5$0.39205K
40z-ai logoGLM 4.7🇨🇳 z-aiOpen50.5----33.4$0.38203K
41xAI logoGrok 4 Fast🇺🇸 xAI50.4-----$0.202.0M
42OpenAI logoGPT-5.1🇺🇸 OpenAI49.6-13.768.066.047.6$1.25400K
43Google DeepMind logoGemini 3 Flash Preview🇺🇸 Google DeepMind49.1-9.875.4-64.3$0.501.0M
44Alibaba Qwen logoQwen3 235B A22B Instruct 2507🇨🇳 Alibaba QwenOpen48.559.6----$0.07262K
45Google DeepMind logoGemini 2.0 Flash🇺🇸 Google DeepMind48.038.2----$0.101.0M
46Anthropic logoClaude 3.7 Sonnet🇺🇸 Anthropic47.764.93.861.052.8-$3.00200K
47Anthropic logoClaude Sonnet 4.6🇺🇸 Anthropic47.6--75.2--$3.001.0M
48OpenAI logogpt-oss-120b🇺🇸 OpenAIOpen46.941.8--26.018.7$0.04131K
49xAI logoGrok 3 Mini🇺🇸 xAI46.649.3----$0.30131K
50Anthropic logoClaude Opus 4.5🇺🇸 Anthropic45.4-26.576.774.463.1$5.00200K
51OpenAI logoGPT-5 Nano🇺🇸 OpenAI45.3---34.811.5$0.05400K
52DeepSeek logoR1🇨🇳 DeepSeekOpen45.156.9----$0.7064K
53Anthropic logoClaude Sonnet 4🇺🇸 Anthropic44.661.34.9-64.9-$3.001.0M
54OpenAI logoGPT-4.1 Mini🇺🇸 OpenAI44.532.4--23.9-$0.401.0M
55OpenAI logoGPT-4.1🇺🇸 OpenAI43.352.4-48.539.6-$2.001.0M
56OpenAI logoGPT-4o-mini (2024-07-18)🇺🇸 OpenAI43.23.6----$0.15128K
57Anthropic logoClaude 3.5 Sonnet🇺🇸 Anthropic42.351.64.6---N/A0K
58Google DeepMind logoGemma 3 27B🇺🇸 Google DeepMindOpen42.24.9----$0.08131K
59Google DeepMind logoGemma 3 27B (free)🇺🇸 Google DeepMindOpen42.24.9----Free131K
60Anthropic logoClaude Sonnet 4.5🇺🇸 Anthropic42.1-14.771.370.646.5$3.001.0M
61Anthropic logoClaude Opus 4🇺🇸 Anthropic41.772.06.970.767.6-$15.00200K
62OpenAI logoo1-preview🇺🇸 OpenAI41.5-----N/A0K
63Anthropic logoClaude Opus 4.1🇺🇸 Anthropic41.3--73.3-38.0$15.00200K
64Google DeepMind logoGemini 1.5 Pro (Feb 2024)🇺🇸 Google DeepMind41.3-----N/A0K
65Alibaba Qwen logoQwen2.5-Max🇨🇳 Alibaba QwenOpen41.021.8----N/A0K
66Google DeepMind logoGemini 2.5 Flash🇺🇸 Google DeepMind40.047.1---17.1$0.301.0M
67OpenAI logoGPT-4o-mini🇺🇸 OpenAI39.63.6----$0.15128K
68xAI logoGrok 3🇺🇸 xAI38.453.3----$3.00131K
69OpenAI logoo3 Mini🇺🇸 OpenAI38.460.41.3---$1.10200K
70Meta logoLlama 3.1 405B🇺🇸 MetaOpen38.0-----N/A0K
71Google DeepMind logoGemini 2.0 Flash Thinking (Jan 2025)🇺🇸 Google DeepMind37.718.2----N/A0K
72OpenAI logoGPT-4o (2024-11-20)🇺🇸 OpenAI37.723.10.131.021.6-$2.50128K
73Anthropic logoClaude 3.5 Haiku🇺🇸 Anthropic37.228.0----$0.80200K
74Anthropic logoClaude Haiku 4.5🇺🇸 Anthropic37.1----35.5$1.00200K
75OpenAI logoGPT-4.5🇺🇸 OpenAI35.944.9----N/A0K
76OpenAI logoGPT-4o (2024-08-06)🇺🇸 OpenAI35.623.1----$2.50128K
77OpenAI logoGPT-4.1 Nano🇺🇸 OpenAI35.28.9----$0.101.0M
78OpenAI logoo1-mini🇺🇸 OpenAI34.932.9----N/A0K
79Anthropic logoClaude 3 Opus🇺🇸 Anthropic33.7-----N/A0K
80Meta logoLlama 3 70B Instruct🇺🇸 MetaOpen32.4-----$0.518K
81Anthropic logoClaude 3 Haiku🇺🇸 Anthropic28.7-----$0.25200K
82Meta logoLlama 4 Maverick🇺🇸 MetaOpen28.015.6--21.0-$0.151.0M
83Mistral AI logoMixtral 8x22B Instruct🇫🇷 Mistral AIOpen23.5-----$2.0066K
84Meta logoLlama 4 Scout🇺🇸 MetaOpen18.9---9.1-$0.08328K
85Alibaba Qwen logoQwQ 32B🇨🇳 Alibaba QwenOpen13.520.9----$0.15131K
90+ Gold 80-89 70-79 60-69 <60Scores in % unless noted. Avg = unweighted mean across tested benchmarks.

Models ranked by coding performance across HumanEval+, SWE-bench Verified, Aider Polyglot, and other code generation benchmarks. Scores reflect real-world software engineering tasks including bug fixing, code completion, and multi-file editing.

Which AI model is best for coding?

The top-ranked coding model changes frequently. Check the live leaderboard above for the current leader based on HumanEval+, SWE-bench Verified, and other coding benchmarks.

What benchmarks measure coding ability?

Key coding benchmarks include HumanEval+ (function completion), SWE-bench Verified (real GitHub issues), Aider Polyglot (multi-language editing), and Terminal-bench (CLI tasks).

Are open-source models competitive for coding?

Yes. Several open-source models score competitively on coding benchmarks, especially models from DeepSeek and Meta.