Coding rankingData updated · May 6, 202612 ranked models

Best AI Models for Coding

A practical coding shortlist built from SWE-bench Verified and nearby software engineering benchmarks. Pricing and context are shown as secondary signals, not hidden ranking boosts.

Rank 1Limited confidence

OpenAI

Benchmark score
Rank 2Medium confidence

Anthropic

Benchmark score
Rank 3Limited confidence

OpenAI

Benchmark score

Scores are based on the visible benchmark set and available metadata.

Missing prices stay missing
RankModelScoreEvidenceInput priceContext
#1GPT-5 Chat
OpenAI
811 benchmarks · Limited$1.25/M128K
#2Claude Opus 4.6
Anthropic
78.73 benchmarks · Medium$5.00/M1M
#3o3 Pro
OpenAI
78.11 benchmarks · Limited$20/M200K
#4GPT-5.4
OpenAI
76.91 benchmarks · Limited$2.50/M1.1M
#5Claude Opus 4.5
Anthropic
76.73 benchmarks · Medium$5.00/M200K
#6GPT-5.5
OpenAI
76.11 benchmarks · Limited$5.00/M400K
#7Claude Sonnet 4.6
Anthropic
75.21 benchmarks · Limited$3.00/M1M
#8GPT-5.3-Codex
OpenAI
74.82 benchmarks · Medium$1.75/M400K
#9GPT-5.2
OpenAI
73.83 benchmarks · Medium$1.75/M400K
#10Kimi K2.5
moonshotai
73.82 benchmarks · Medium$0.44/M262K
#11GPT-5
OpenAI
73.64 benchmarks · High$1.25/M400K
#12Claude Opus 4.1
Anthropic
73.42 benchmarks · Medium$15/M200K
Strict caveat

Coding scores reflect public benchmark tasks. They do not cover every repository, language, framework, or production workflow.

Coding scores prioritize SWE-bench Verified when available. If that score is missing, related coding benchmarks can contribute at a visible discount instead of being treated as equivalent evidence.

BenchGecko ranks models from published benchmark scores and model metadata. Scores do not measure every use case, and missing data can affect rankings.

Related ranking

Reasoning models ranked from public benchmark scores across GPQA Diamond, BBH, ARC-AGI, SimpleBench, and related tests.

Related ranking

Math models ranked from public benchmark scores across GSM8K, MATH-level tests, AIME-style tasks, and FrontierMath where available.

Related ranking

Open-weight AI models ranked from available benchmark data, coverage confidence, pricing metadata, and listed license signals.

Which AI model is best for coding?

This page ranks coding models by published SWE-bench Verified results when available, with related coding benchmarks used as supporting evidence.

Does BenchGecko test these models hands-on?

BenchGecko ranks models from published benchmark scores and metadata. It does not claim hands-on testing unless a page explicitly says so.

Why does pricing matter on a coding page?

Pricing is shown because production coding agents can make many calls. The main ranking still comes from benchmark evidence.