Best AI Models for Coding

Q: Which AI model is best for coding?

The top-ranked coding model changes frequently. Check the live leaderboard above for the current leader based on HumanEval+, SWE-bench Verified, and other coding benchmarks.

Q: What benchmarks measure coding ability?

Key coding benchmarks include HumanEval+ (function completion), SWE-bench Verified (real GitHub issues), Aider Polyglot (multi-language editing), and Terminal-bench (CLI tasks).

Q: Are open-source models competitive for coding?

Yes. Several open-source models score competitively on coding benchmarks, especially models from DeepSeek and Meta.

AI models ranked by coding benchmarks. Compare HumanEval+, SWE-bench Verified, Aider Polyglot, and more across all providers.

Models

Providers

Open Source

$0.72

Median $/1M in

Top 3

Claude Mythos Preview

🇺🇸 Anthropic

1.0MN/A/M in

81.814 benchmarks

Gemini 2.5 Pro Preview 05-06

🇺🇸 Google DeepMind

1.0M$1.25/M in

76.91 benchmarks

Full Rankings

#	Model	Avg	aider poly	GSO-Bench	swe bench	swe bench	terminal b	$/1M in	Context
1	GPT-5 Chat🇺🇸 OpenAI	81.9	88.0	-	-	-	-	$1.25	128K
2	Claude Mythos Preview🇺🇸 Anthropic	81.8	-	-	93.9	-	82.0	N/A	1.0M
3	Gemini 2.5 Pro Preview 05-06🇺🇸 Google DeepMind	76.9	76.9	-	-	-	-	$1.25	1.0M
4	o4 Mini High🇺🇸 OpenAI	72.0	72.0	-	-	-	-	$1.10	200K
5	Grok 3 Beta🇺🇸 xAI	69.5	53.3	-	-	-	-	$3.00	131K
6	gpt-oss-120b (free)🇺🇸 OpenAIOpen	68.7	41.8	-	-	-	-	Free	131K
7	Grok 3 Mini Beta🇺🇸 xAI	64.8	49.3	-	-	-	-	$0.30	131K
8	o3 Pro🇺🇸 OpenAI	61.2	84.9	-	-	-	-	$20.00	200K
9	Gemini 3.1 Pro Preview🇺🇸 Google DeepMind	60.6	-	-	75.6	-	78.4	$2.00	1.0M
10	Gemini 3 Pro🇺🇸 Google DeepMind	60.5	-	18.6	72.9	-	69.4	N/A	0K
11	o3 Mini High🇺🇸 OpenAI	60.4	60.4	-	-	-	-	$1.10	200K
12	DeepSeek V3🇨🇳 DeepSeekOpen	59.0	48.4	-	-	-	-	$0.32	164K
13	GPT-5.4🇺🇸 OpenAI	59.0	-	-	76.9	-	-	$2.50	1.1M
14	Qwen3 32B🇨🇳 Alibaba QwenOpen	58.2	40.0	-	-	-	-	$0.08	41K
15	R1 0528🇨🇳 DeepSeekOpen	57.9	71.4	-	-	-	-	$0.50	164K
16	GLM 5🇨🇳 z-aiOpen	57.6	-	-	72.1	-	52.4	$0.72	80K
17	Claude Opus 4.6🇺🇸 Anthropic	57.5	-	33.3	78.7	-	74.7	$5.00	1.0M
18	o1🇺🇸 OpenAI	56.4	61.7	-	-	-	-	$15.00	200K
19	Qwen3 235B A22B🇨🇳 Alibaba QwenOpen	56.4	59.6	-	-	-	-	$0.46	131K
20	Gemini 2.5 Pro🇺🇸 Google DeepMind	56.2	83.1	3.9	57.6	-	32.6	$1.25	1.0M
21	Kimi K2 0711🇨🇳 moonshotaiOpen	56.2	59.1	4.9	-	-	27.8	$0.57	131K
22	GPT-5 Mini🇺🇸 OpenAI	56.0	-	-	64.7	59.8	34.8	$0.25	400K
23	o3🇺🇸 OpenAI	55.2	81.3	8.8	62.3	58.4	-	$2.00	200K
24	DeepSeek V3 0324🇨🇳 DeepSeekOpen	55.1	55.1	-	-	-	-	$0.20	164K
25	MiniMax M2.5🇨🇳 minimaxOpen	55.1	-	-	-	-	42.2	$0.12	197K
26	Grok 4🇺🇸 xAI	54.8	79.6	-	-	-	27.2	$3.00	256K
27	GPT-5🇺🇸 OpenAI	54.4	88.0	6.9	73.5	65.0	49.6	$1.25	400K
28	GPT-5.2🇺🇸 OpenAI	54.0	-	27.4	73.8	71.8	64.9	$1.75	400K
29	Gemini 2.0 Pro🇺🇸 Google DeepMind	53.7	35.6	-	-	-	-	N/A	0K
30	Kimi K2 Thinking🇨🇳 moonshotaiOpen	53.3	-	-	-	63.4	35.7	$0.60	262K
31	DeepSeek V3.2 Exp🇨🇳 DeepSeekOpen	53.2	74.2	-	-	-	-	$0.27	164K
32	o4 Mini🇺🇸 OpenAI	53.2	72.0	3.6	-	45.0	-	$1.10	200K
33	Qwen2.5 Coder 32B Instruct🇨🇳 Alibaba QwenOpen	53.1	16.4	-	-	-	-	$0.66	33K
34	DeepSeek V3.2🇨🇳 DeepSeekOpen	53.0	74.2	-	-	-	39.6	$0.26	164K
35	GPT-5.3-Codex🇺🇸 OpenAI	52.2	-	-	74.8	-	77.3	$1.75	400K
36	Kimi K2.5🇨🇳 moonshotaiOpen	52.0	-	-	73.8	-	43.2	$0.38	262K
37	Gemini 2.5 Pro Preview 06-05🇺🇸 Google DeepMind	50.9	83.1	-	-	-	-	$1.25	1.0M
38	GLM 4.6🇨🇳 z-aiOpen	50.8	-	-	-	-	24.5	$0.39	205K
39	GLM 4.7🇨🇳 z-aiOpen	50.5	-	-	-	-	33.4	$0.39	203K
40	Grok 4 Fast🇺🇸 xAI	50.4	-	-	-	-	-	$0.20	2.0M
41	GPT-5.1🇺🇸 OpenAI	49.6	-	13.7	68.0	66.0	47.6	$1.25	400K
42	Gemini 3 Flash Preview🇺🇸 Google DeepMind	49.1	-	9.8	75.4	-	64.3	$0.50	1.0M
43	Qwen3 235B A22B Instruct 2507🇨🇳 Alibaba QwenOpen	48.5	59.6	-	-	-	-	$0.07	262K
44	Gemini 2.0 Flash🇺🇸 Google DeepMind	48.0	38.2	-	-	-	-	$0.10	1.0M
45	Claude 3.7 Sonnet🇺🇸 Anthropic	47.7	64.9	3.8	61.0	52.8	-	$3.00	200K
46	Claude Sonnet 4.6🇺🇸 Anthropic	47.6	-	-	75.2	-	-	$3.00	1.0M
47	gpt-oss-120b🇺🇸 OpenAIOpen	46.9	41.8	-	-	26.0	18.7	$0.04	131K
48	Grok 3 Mini🇺🇸 xAI	46.6	49.3	-	-	-	-	$0.30	131K
49	Claude Opus 4.5🇺🇸 Anthropic	45.4	-	26.5	76.7	74.4	63.1	$5.00	200K
50	GPT-5 Nano🇺🇸 OpenAI	45.3	-	-	-	34.8	11.5	$0.05	400K
51	R1🇨🇳 DeepSeekOpen	45.1	56.9	-	-	-	-	$0.70	64K
52	Claude Sonnet 4🇺🇸 Anthropic	44.6	61.3	4.9	-	64.9	-	$3.00	1.0M
53	GPT-4.1 Mini🇺🇸 OpenAI	44.5	32.4	-	-	23.9	-	$0.40	1.0M
54	GPT-4.1🇺🇸 OpenAI	43.3	52.4	-	48.5	39.6	-	$2.00	1.0M
55	GPT-4o-mini (2024-07-18)🇺🇸 OpenAI	43.2	3.6	-	-	-	-	$0.15	128K
56	Claude 3.5 Sonnet🇺🇸 Anthropic	42.3	51.6	4.6	-	-	-	N/A	0K
57	Gemma 3 27B🇺🇸 Google DeepMindOpen	42.2	4.9	-	-	-	-	$0.08	131K
58	Gemma 3 27B (free)🇺🇸 Google DeepMindOpen	42.2	4.9	-	-	-	-	Free	131K
59	Claude Sonnet 4.5🇺🇸 Anthropic	42.1	-	14.7	71.3	70.6	46.5	$3.00	1.0M
60	Claude Opus 4🇺🇸 Anthropic	41.7	72.0	6.9	70.7	67.6	-	$15.00	200K
61	o1-preview🇺🇸 OpenAI	41.5	-	-	-	-	-	N/A	0K
62	Claude Opus 4.1🇺🇸 Anthropic	41.3	-	-	73.3	-	38.0	$15.00	200K
63	Gemini 1.5 Pro (Feb 2024)🇺🇸 Google DeepMind	41.3	-	-	-	-	-	N/A	0K
64	Qwen2.5-Max🇨🇳 Alibaba QwenOpen	41.0	21.8	-	-	-	-	N/A	0K
65	Gemini 2.5 Flash🇺🇸 Google DeepMind	40.0	47.1	-	-	-	17.1	$0.30	1.0M
66	GPT-4o-mini🇺🇸 OpenAI	39.6	3.6	-	-	-	-	$0.15	128K
67	Grok 3🇺🇸 xAI	38.4	53.3	-	-	-	-	$3.00	131K
68	o3 Mini🇺🇸 OpenAI	38.4	60.4	1.3	-	-	-	$1.10	200K
69	Llama 3.1 405B🇺🇸 MetaOpen	38.0	-	-	-	-	-	N/A	0K
70	Gemini 2.0 Flash Thinking (Jan 2025)🇺🇸 Google DeepMind	37.7	18.2	-	-	-	-	N/A	0K
71	GPT-4o (2024-11-20)🇺🇸 OpenAI	37.7	23.1	0.1	31.0	21.6	-	$2.50	128K
72	Claude 3.5 Haiku🇺🇸 Anthropic	37.2	28.0	-	-	-	-	$0.80	200K
73	Claude Haiku 4.5🇺🇸 Anthropic	37.1	-	-	-	-	35.5	$1.00	200K
74	GPT-4.5🇺🇸 OpenAI	35.9	44.9	-	-	-	-	N/A	0K
75	GPT-4o (2024-08-06)🇺🇸 OpenAI	35.6	23.1	-	-	-	-	$2.50	128K
76	GPT-4.1 Nano🇺🇸 OpenAI	35.2	8.9	-	-	-	-	$0.10	1.0M
77	o1-mini🇺🇸 OpenAI	34.9	32.9	-	-	-	-	N/A	0K
78	Claude 3 Opus🇺🇸 Anthropic	33.7	-	-	-	-	-	N/A	0K
79	Llama 3 70B Instruct🇺🇸 MetaOpen	32.4	-	-	-	-	-	$0.51	8K
80	Claude 3 Haiku🇺🇸 Anthropic	28.7	-	-	-	-	-	$0.25	200K
81	Llama 4 Maverick🇺🇸 MetaOpen	28.0	15.6	-	-	21.0	-	$0.15	1.0M
82	Mixtral 8x22B Instruct🇫🇷 Mistral AIOpen	23.5	-	-	-	-	-	$2.00	66K
83	Llama 4 Scout🇺🇸 MetaOpen	18.9	-	-	-	9.1	-	$0.08	328K
84	QwQ 32B🇨🇳 Alibaba QwenOpen	13.5	20.9	-	-	-	-	$0.15	131K

90+ Gold 80-89 70-79 60-69 <60Scores in % unless noted. Avg = unweighted mean across tested benchmarks.

About this category

Models ranked by coding performance across HumanEval+, SWE-bench Verified, Aider Polyglot, and other code generation benchmarks. Scores reflect real-world software engineering tasks including bug fixing, code completion, and multi-file editing.

Related categories

Best AI Models for Reasoning

AI models ranked by reasoning benchmarks. Compare GPQA Diamond, ARC-AGI, BBH, and other reasoning tests across all providers.

Best AI Models for Math

AI models ranked by math benchmarks. Compare MATH-500, GSM8K, and competition-level math scores across all providers.

Flagship AI Models

The best AI model from each provider, ranked by benchmark score. Compare the flagships from OpenAI, Anthropic, Google, Meta, and more.

Frequently asked questions

Which AI model is best for coding?

The top-ranked coding model changes frequently. Check the live leaderboard above for the current leader based on HumanEval+, SWE-bench Verified, and other coding benchmarks.

What benchmarks measure coding ability?

Key coding benchmarks include HumanEval+ (function completion), SWE-bench Verified (real GitHub issues), Aider Polyglot (multi-language editing), and Terminal-bench (CLI tasks).

Are open-source models competitive for coding?

Yes. Several open-source models score competitively on coding benchmarks, especially models from DeepSeek and Meta.

Back to all models