Beta

Best AI Models for Coding

AI models ranked by coding benchmarks. Compare HumanEval+, SWE-bench Verified, Aider Polyglot, and more across all providers.

84
Models
11
Providers
28
Open Source
$0.72
Median $/1M in
#ModelAvgaider polyGSO-Benchswe bench swe bench terminal b$/1M inContext
1OpenAI logoGPT-5 Chat🇺🇸 OpenAI81.988.0----$1.25128K
2Anthropic logoClaude Mythos Preview🇺🇸 Anthropic81.8--93.9-82.0N/A1.0M
3Google DeepMind logoGemini 2.5 Pro Preview 05-06🇺🇸 Google DeepMind76.976.9----$1.251.0M
4OpenAI logoo4 Mini High🇺🇸 OpenAI72.072.0----$1.10200K
5xAI logoGrok 3 Beta🇺🇸 xAI69.553.3----$3.00131K
6OpenAI logogpt-oss-120b (free)🇺🇸 OpenAIOpen68.741.8----Free131K
7xAI logoGrok 3 Mini Beta🇺🇸 xAI64.849.3----$0.30131K
8OpenAI logoo3 Pro🇺🇸 OpenAI61.284.9----$20.00200K
9Google DeepMind logoGemini 3.1 Pro Preview🇺🇸 Google DeepMind60.6--75.6-78.4$2.001.0M
10Google DeepMind logoGemini 3 Pro🇺🇸 Google DeepMind60.5-18.672.9-69.4N/A0K
11OpenAI logoo3 Mini High🇺🇸 OpenAI60.460.4----$1.10200K
12DeepSeek logoDeepSeek V3🇨🇳 DeepSeekOpen59.048.4----$0.32164K
13OpenAI logoGPT-5.4🇺🇸 OpenAI59.0--76.9--$2.501.1M
14Alibaba Qwen logoQwen3 32B🇨🇳 Alibaba QwenOpen58.240.0----$0.0841K
15DeepSeek logoR1 0528🇨🇳 DeepSeekOpen57.971.4----$0.50164K
16z-ai logoGLM 5🇨🇳 z-aiOpen57.6--72.1-52.4$0.7280K
17Anthropic logoClaude Opus 4.6🇺🇸 Anthropic57.5-33.378.7-74.7$5.001.0M
18OpenAI logoo1🇺🇸 OpenAI56.461.7----$15.00200K
19Alibaba Qwen logoQwen3 235B A22B🇨🇳 Alibaba QwenOpen56.459.6----$0.46131K
20Google DeepMind logoGemini 2.5 Pro🇺🇸 Google DeepMind56.283.13.957.6-32.6$1.251.0M
21moonshotai logoKimi K2 0711🇨🇳 moonshotaiOpen56.259.14.9--27.8$0.57131K
22OpenAI logoGPT-5 Mini🇺🇸 OpenAI56.0--64.759.834.8$0.25400K
23OpenAI logoo3🇺🇸 OpenAI55.281.38.862.358.4-$2.00200K
24DeepSeek logoDeepSeek V3 0324🇨🇳 DeepSeekOpen55.155.1----$0.20164K
25minimax logoMiniMax M2.5🇨🇳 minimaxOpen55.1----42.2$0.12197K
26xAI logoGrok 4🇺🇸 xAI54.879.6---27.2$3.00256K
27OpenAI logoGPT-5🇺🇸 OpenAI54.488.06.973.565.049.6$1.25400K
28OpenAI logoGPT-5.2🇺🇸 OpenAI54.0-27.473.871.864.9$1.75400K
29Google DeepMind logoGemini 2.0 Pro🇺🇸 Google DeepMind53.735.6----N/A0K
30moonshotai logoKimi K2 Thinking🇨🇳 moonshotaiOpen53.3---63.435.7$0.60262K
31DeepSeek logoDeepSeek V3.2 Exp🇨🇳 DeepSeekOpen53.274.2----$0.27164K
32OpenAI logoo4 Mini🇺🇸 OpenAI53.272.03.6-45.0-$1.10200K
33Alibaba Qwen logoQwen2.5 Coder 32B Instruct🇨🇳 Alibaba QwenOpen53.116.4----$0.6633K
34DeepSeek logoDeepSeek V3.2🇨🇳 DeepSeekOpen53.074.2---39.6$0.26164K
35OpenAI logoGPT-5.3-Codex🇺🇸 OpenAI52.2--74.8-77.3$1.75400K
36moonshotai logoKimi K2.5🇨🇳 moonshotaiOpen52.0--73.8-43.2$0.38262K
37Google DeepMind logoGemini 2.5 Pro Preview 06-05🇺🇸 Google DeepMind50.983.1----$1.251.0M
38z-ai logoGLM 4.6🇨🇳 z-aiOpen50.8----24.5$0.39205K
39z-ai logoGLM 4.7🇨🇳 z-aiOpen50.5----33.4$0.39203K
40xAI logoGrok 4 Fast🇺🇸 xAI50.4-----$0.202.0M
41OpenAI logoGPT-5.1🇺🇸 OpenAI49.6-13.768.066.047.6$1.25400K
42Google DeepMind logoGemini 3 Flash Preview🇺🇸 Google DeepMind49.1-9.875.4-64.3$0.501.0M
43Alibaba Qwen logoQwen3 235B A22B Instruct 2507🇨🇳 Alibaba QwenOpen48.559.6----$0.07262K
44Google DeepMind logoGemini 2.0 Flash🇺🇸 Google DeepMind48.038.2----$0.101.0M
45Anthropic logoClaude 3.7 Sonnet🇺🇸 Anthropic47.764.93.861.052.8-$3.00200K
46Anthropic logoClaude Sonnet 4.6🇺🇸 Anthropic47.6--75.2--$3.001.0M
47OpenAI logogpt-oss-120b🇺🇸 OpenAIOpen46.941.8--26.018.7$0.04131K
48xAI logoGrok 3 Mini🇺🇸 xAI46.649.3----$0.30131K
49Anthropic logoClaude Opus 4.5🇺🇸 Anthropic45.4-26.576.774.463.1$5.00200K
50OpenAI logoGPT-5 Nano🇺🇸 OpenAI45.3---34.811.5$0.05400K
51DeepSeek logoR1🇨🇳 DeepSeekOpen45.156.9----$0.7064K
52Anthropic logoClaude Sonnet 4🇺🇸 Anthropic44.661.34.9-64.9-$3.001.0M
53OpenAI logoGPT-4.1 Mini🇺🇸 OpenAI44.532.4--23.9-$0.401.0M
54OpenAI logoGPT-4.1🇺🇸 OpenAI43.352.4-48.539.6-$2.001.0M
55OpenAI logoGPT-4o-mini (2024-07-18)🇺🇸 OpenAI43.23.6----$0.15128K
56Anthropic logoClaude 3.5 Sonnet🇺🇸 Anthropic42.351.64.6---N/A0K
57Google DeepMind logoGemma 3 27B🇺🇸 Google DeepMindOpen42.24.9----$0.08131K
58Google DeepMind logoGemma 3 27B (free)🇺🇸 Google DeepMindOpen42.24.9----Free131K
59Anthropic logoClaude Sonnet 4.5🇺🇸 Anthropic42.1-14.771.370.646.5$3.001.0M
60Anthropic logoClaude Opus 4🇺🇸 Anthropic41.772.06.970.767.6-$15.00200K
61OpenAI logoo1-preview🇺🇸 OpenAI41.5-----N/A0K
62Anthropic logoClaude Opus 4.1🇺🇸 Anthropic41.3--73.3-38.0$15.00200K
63Google DeepMind logoGemini 1.5 Pro (Feb 2024)🇺🇸 Google DeepMind41.3-----N/A0K
64Alibaba Qwen logoQwen2.5-Max🇨🇳 Alibaba QwenOpen41.021.8----N/A0K
65Google DeepMind logoGemini 2.5 Flash🇺🇸 Google DeepMind40.047.1---17.1$0.301.0M
66OpenAI logoGPT-4o-mini🇺🇸 OpenAI39.63.6----$0.15128K
67xAI logoGrok 3🇺🇸 xAI38.453.3----$3.00131K
68OpenAI logoo3 Mini🇺🇸 OpenAI38.460.41.3---$1.10200K
69Meta logoLlama 3.1 405B🇺🇸 MetaOpen38.0-----N/A0K
70Google DeepMind logoGemini 2.0 Flash Thinking (Jan 2025)🇺🇸 Google DeepMind37.718.2----N/A0K
71OpenAI logoGPT-4o (2024-11-20)🇺🇸 OpenAI37.723.10.131.021.6-$2.50128K
72Anthropic logoClaude 3.5 Haiku🇺🇸 Anthropic37.228.0----$0.80200K
73Anthropic logoClaude Haiku 4.5🇺🇸 Anthropic37.1----35.5$1.00200K
74OpenAI logoGPT-4.5🇺🇸 OpenAI35.944.9----N/A0K
75OpenAI logoGPT-4o (2024-08-06)🇺🇸 OpenAI35.623.1----$2.50128K
76OpenAI logoGPT-4.1 Nano🇺🇸 OpenAI35.28.9----$0.101.0M
77OpenAI logoo1-mini🇺🇸 OpenAI34.932.9----N/A0K
78Anthropic logoClaude 3 Opus🇺🇸 Anthropic33.7-----N/A0K
79Meta logoLlama 3 70B Instruct🇺🇸 MetaOpen32.4-----$0.518K
80Anthropic logoClaude 3 Haiku🇺🇸 Anthropic28.7-----$0.25200K
81Meta logoLlama 4 Maverick🇺🇸 MetaOpen28.015.6--21.0-$0.151.0M
82Mistral AI logoMixtral 8x22B Instruct🇫🇷 Mistral AIOpen23.5-----$2.0066K
83Meta logoLlama 4 Scout🇺🇸 MetaOpen18.9---9.1-$0.08328K
84Alibaba Qwen logoQwQ 32B🇨🇳 Alibaba QwenOpen13.520.9----$0.15131K
90+ Gold 80-89 70-79 60-69 <60Scores in % unless noted. Avg = unweighted mean across tested benchmarks.

Models ranked by coding performance across HumanEval+, SWE-bench Verified, Aider Polyglot, and other code generation benchmarks. Scores reflect real-world software engineering tasks including bug fixing, code completion, and multi-file editing.

Which AI model is best for coding?

The top-ranked coding model changes frequently. Check the live leaderboard above for the current leader based on HumanEval+, SWE-bench Verified, and other coding benchmarks.

What benchmarks measure coding ability?

Key coding benchmarks include HumanEval+ (function completion), SWE-bench Verified (real GitHub issues), Aider Polyglot (multi-language editing), and Terminal-bench (CLI tasks).

Are open-source models competitive for coding?

Yes. Several open-source models score competitively on coding benchmarks, especially models from DeepSeek and Meta.