How much does GPT-5.3-Codex cost?

GPT-5.3-Codex costs $1.75 per million input tokens and $14.00 per million output tokens. For a typical conversation (~2,000 tokens), that's approximately $0.018 per message.

What benchmarks has GPT-5.3-Codex been tested on?

GPT-5.3-Codex has been evaluated on 9 benchmarks. Top scores: WeirdML: 79.3, Terminal Bench: 77.3, SWE-Bench verified: 74.8.

Is GPT-5.3-Codex open source?

No, GPT-5.3-Codex is a proprietary model by OpenAI.

How does GPT-5.3-Codex compare to Claude Opus 4.6?

GPT-5.3-Codex has an average score of 80.3 while Claude Opus 4.6 scores 81.0. Claude Opus 4.6 slightly outperforms GPT-5.3-Codex overall. GPT-5.3-Codex costs $1.75/1M input vs Claude Opus 4.6 at $5.00/1M input. See full comparison →

Home/Models/GPT-5.3-Codex

GPT-5.3-Codex

Name: GPT-5.3-Codex
Price: 1.75 USD
Author: OpenAI

by OpenAI · Released Feb 2026

Multimodal

80.3

avg score

Rank #23

Compare

Better than 90% of all models

Context

400K tokens (~200 books)

Input $/1M

$1.75

Output $/1M

$14.00

Type

multimodal

License

Proprietary

Benchmarks

9 tested

Data updated today

About

GPT-5.3-Codex is OpenAI’s most advanced agentic coding model, combining the frontier software engineering performance of GPT-5.2-Codex with the broader reasoning and professional knowledge capabilities of GPT-5.2. It achieves state-of-the-art results...

Tested on 9 benchmarks with 52.2% average. Top scores: WeirdML (79.3%), Terminal Bench (77.3%), SWE-Bench verified (74.8%).

Looking for similar performance at lower cost?
MiMo-V2-Flash scores 81.7 (102% as good) at $0.09/1M input · 95% cheaper

Capabilities

coding

77.1

#8 globally

knowledge

17.8

#202 globally

agentic

32.1

#15 globally

speed

94.0

#3 globally

Benchmark Scores

Compare All

Tested on 9 benchmarks · Ranked across 4 categories

Score Distribution (all 233 models)

0255075100

▲ You are here

codingCompare coding →

WeirdML

Unusual and adversarial machine learning challenges. Tests robustness of reasoning about edge cases in ML systems.

79.3—

Terminal Bench

Complex terminal-based engineering tasks. Models must use command-line tools, navigate filesystems, and debug systems through shell interaction.

77.3—

SWE-Bench verified

Real-world software engineering tasks from GitHub issues. Models must diagnose bugs and write patches that pass test suites. Human-verified subset of SWE-bench.

74.8—

knowledgeCompare knowledge →

PostTrainBench

Evaluates post-training behaviors including instruction following, safety, and helpfulness balance.

17.8—

agenticCompare agentic →

SWE Atlas — Codebase QnA

SEAL SWE Atlas Codebase Q&A. Tests understanding of large codebases through question answering.

32.6—

APEX-Agents

Agent performance evaluation testing multi-step tool use, planning, and execution in realistic environments.

31.7—

Quick compare:

vs Claude Opus 4.6

vs Qwen2.5 32B Instruct

vs MiMo-V2-Flash

Excellent (85+) Good (70-85) Average (50-70) Below (<50)

Model Family · OpenAI GPT-5.3

GPT-5.3 ChatMar 2026

$1.75/M in128Kctx

GPT-5.3-CodexFeb 2026

52.2+52.2

$1.75/M in400Kctx(+272K)9 benchmarks

See the full GPT-5.3 family →

Similar Models

Links

Info

OpenAI Pricing explorer Developers · API

Research

Technical Report

Documentation

API Docs Playground

Community

@OpenAI

BenchGecko API

gpt-5-3-codex

Specifications

Typemultimodal
Context400K tokens (~200 books)
ReleasedFeb 2026
LicenseProprietary
StatusActive
Cost / Message~$0.018

Available On

OpenAI$1.75

Frequently Asked Questions

GPT-5.3-Codex is a proprietary multimodal AI model by OpenAI, released in February 2026. It has an average benchmark score of 80.3. Context window: 400K tokens.

Benchmarks

WeirdML Terminal Bench SWE-Bench verified Artificial Analysis — Agentic Index Artificial Analysis — Quality Index

OpenAI · Provider OpenAI · Economy All Models Compare Models Pricing Developers · API Context Window · Glossary

GPT-5.3-Codex

Frequently Asked Questions

Related Models

Benchmarks

Related Pages