GPT-5.1-Codex-Max is OpenAI’s latest agentic coding model, designed for long-running, high-context software development tasks. It is based on an updated version of the 5.1 reasoning stack and trained on agentic...
Tested on 8 benchmarks with 72.0% average. Top scores: LiveBench — Reasoning (84.6%), LiveBench — Mathematics (83.7%), LiveBench — Coding (81.4%).
Step 3.5 Flash scores 89.5 (98% as good) at $0.10/1M input · 92% cheaper
Regularly refreshed coding problems that avoid data contamination. New problems added monthly to prevent memorization.
LiveBench coding tasks that require multi-step reasoning and tool use. Tests planning and execution of complex coding workflows.
Regularly refreshed reasoning problems testing logical deduction, spatial reasoning, and analytical thinking.
Fresh data analysis tasks testing ability to interpret tables, charts, and statistical data.
Regularly updated math problems that test numerical reasoning, algebra, calculus, and combinatorics.
- Typemultimodal
- Context400K tokens (~200 books)
- ReleasedDec 2025
- LicenseProprietary
- StatusActive
- Cost / Message~$0.013