Coding rankingData updated · May 6, 202612 ranked models

Best AI Models for Coding

A practical coding shortlist built from SWE-bench Verified and nearby software engineering benchmarks. Pricing and context are shown as secondary signals, not hidden ranking boosts.

Methodology Compare models

Rank 1Limited confidence

GPT-5 Chat

OpenAI

Benchmark score

Rank 2Medium confidence

Claude Opus 4.6

Anthropic

78.7

Benchmark score

Rank 3Limited confidence

o3 Pro

OpenAI

78.1

Benchmark score

Ranked model table

Scores are based on the visible benchmark set and available metadata.

Missing prices stay missing

Rank	Model	Score	Evidence	Input price	Context
#1	GPT-5 Chat OpenAI	81	1 benchmarks · Limited	$1.25/M	128K
#2	Claude Opus 4.6 Anthropic	78.7	3 benchmarks · Medium	$5.00/M	1M
#3	o3 Pro OpenAI	78.1	1 benchmarks · Limited	$20/M	200K
#4	GPT-5.4 OpenAI	76.9	1 benchmarks · Limited	$2.50/M	1.1M
#5	Claude Opus 4.5 Anthropic	76.7	3 benchmarks · Medium	$5.00/M	200K
#6	GPT-5.5 OpenAI	76.1	1 benchmarks · Limited	$5.00/M	400K
#7	Claude Sonnet 4.6 Anthropic	75.2	1 benchmarks · Limited	$3.00/M	1M
#8	GPT-5.3-Codex OpenAI	74.8	2 benchmarks · Medium	$1.75/M	400K
#9	GPT-5.2 OpenAI	73.8	3 benchmarks · Medium	$1.75/M	400K
#10	Kimi K2.5 moonshotai	73.8	2 benchmarks · Medium	$0.44/M	262K
#11	GPT-5 OpenAI	73.6	4 benchmarks · High	$1.25/M	400K
#12	Claude Opus 4.1 Anthropic	73.4	2 benchmarks · Medium	$15/M	200K

Benchmarks used

Strict caveat

Coding scores reflect public benchmark tasks. They do not cover every repository, language, framework, or production workflow.

Coding scores prioritize SWE-bench Verified when available. If that score is missing, related coding benchmarks can contribute at a visible discount instead of being treated as equivalent evidence.

BenchGecko ranks models from published benchmark scores and model metadata. Scores do not measure every use case, and missing data can affect rankings.

Related ranking

Best AI Models for Reasoning

Reasoning models ranked from public benchmark scores across GPQA Diamond, BBH, ARC-AGI, SimpleBench, and related tests.

Related ranking

Best AI Models for Math

Math models ranked from public benchmark scores across GSM8K, MATH-level tests, AIME-style tasks, and FrontierMath where available.

Related ranking

Best Open-weight AI Models

Open-weight AI models ranked from available benchmark data, coverage confidence, pricing metadata, and listed license signals.

Questions

Which AI model is best for coding?

This page ranks coding models by published SWE-bench Verified results when available, with related coding benchmarks used as supporting evidence.

Does BenchGecko test these models hands-on?

BenchGecko ranks models from published benchmark scores and metadata. It does not claim hands-on testing unless a page explicitly says so.

Why does pricing matter on a coding page?

Pricing is shown because production coding agents can make many calls. The main ranking still comes from benchmark evidence.