Best AI Models for Coding
A practical coding shortlist built from SWE-bench Verified and nearby software engineering benchmarks. Pricing and context are shown as secondary signals, not hidden ranking boosts.
GPT-5 Chat
OpenAI
Claude Opus 4.6
Anthropic
o3 Pro
OpenAI
Ranked model table
Scores are based on the visible benchmark set and available metadata.
| Rank | Model | Score | Evidence | Input price | Context |
|---|---|---|---|---|---|
| #1 | GPT-5 Chat OpenAI | 81 | 1 benchmarks · Limited | $1.25/M | 128K |
| #2 | Claude Opus 4.6 Anthropic | 78.7 | 3 benchmarks · Medium | $5.00/M | 1M |
| #3 | o3 Pro OpenAI | 78.1 | 1 benchmarks · Limited | $20/M | 200K |
| #4 | GPT-5.4 OpenAI | 76.9 | 1 benchmarks · Limited | $2.50/M | 1.1M |
| #5 | Claude Opus 4.5 Anthropic | 76.7 | 3 benchmarks · Medium | $5.00/M | 200K |
| #6 | GPT-5.5 OpenAI | 76.1 | 1 benchmarks · Limited | $5.00/M | 400K |
| #7 | Claude Sonnet 4.6 Anthropic | 75.2 | 1 benchmarks · Limited | $3.00/M | 1M |
| #8 | GPT-5.3-Codex OpenAI | 74.8 | 2 benchmarks · Medium | $1.75/M | 400K |
| #9 | GPT-5.2 OpenAI | 73.8 | 3 benchmarks · Medium | $1.75/M | 400K |
| #10 | Kimi K2.5 moonshotai | 73.8 | 2 benchmarks · Medium | $0.44/M | 262K |
| #11 | GPT-5 OpenAI | 73.6 | 4 benchmarks · High | $1.25/M | 400K |
| #12 | Claude Opus 4.1 Anthropic | 73.4 | 2 benchmarks · Medium | $15/M | 200K |
Coding scores reflect public benchmark tasks. They do not cover every repository, language, framework, or production workflow.
Coding scores prioritize SWE-bench Verified when available. If that score is missing, related coding benchmarks can contribute at a visible discount instead of being treated as equivalent evidence.
BenchGecko ranks models from published benchmark scores and model metadata. Scores do not measure every use case, and missing data can affect rankings.
Best AI Models for Reasoning
Reasoning models ranked from public benchmark scores across GPQA Diamond, BBH, ARC-AGI, SimpleBench, and related tests.
Best AI Models for Math
Math models ranked from public benchmark scores across GSM8K, MATH-level tests, AIME-style tasks, and FrontierMath where available.
Best Open-weight AI Models
Open-weight AI models ranked from available benchmark data, coverage confidence, pricing metadata, and listed license signals.
Questions
Which AI model is best for coding?
This page ranks coding models by published SWE-bench Verified results when available, with related coding benchmarks used as supporting evidence.
Does BenchGecko test these models hands-on?
BenchGecko ranks models from published benchmark scores and metadata. It does not claim hands-on testing unless a page explicitly says so.
Why does pricing matter on a coding page?
Pricing is shown because production coding agents can make many calls. The main ranking still comes from benchmark evidence.