How much does GPT-5.4 cost?

GPT-5.4 costs $2.50 per million input tokens and $15.00 per million output tokens. For a typical conversation (~2,000 tokens), that's approximately $0.020 per message.

What benchmarks has GPT-5.4 been tested on?

GPT-5.4 has been evaluated on 16 benchmarks. Top scores: Chatbot Arena Elo — Overall: 1465.8, OTIS Mock AIME 2024-2025: 95.3, ARC-AGI: 93.7.

Is GPT-5.4 open source?

No, GPT-5.4 is a proprietary model by OpenAI.

How does GPT-5.4 compare to Claude Opus 4.6 (Fast)?

GPT-5.4 has an average score of 83.4 while Claude Opus 4.6 (Fast) scores 83.3. GPT-5.4 outperforms Claude Opus 4.6 (Fast) overall. GPT-5.4 costs $2.50/1M input vs Claude Opus 4.6 (Fast) at $30.00/1M input. See full comparison →

Home/Models/GPT-5.4

GPT-5.4

Name: GPT-5.4
Price: 2.5 USD
Author: OpenAI

by OpenAI · Released Mar 2026

Multimodal1M Context

83.4

avg score

Rank #17

Compare

Better than 93% of all models

Context

1.1M tokens (~525 books)

Input $/1M

$2.50

Output $/1M

$15.00

Type

multimodal

License

Proprietary

Benchmarks

16 tested

Data updated today

About

GPT-5.4 is OpenAI’s latest frontier model, unifying the Codex and GPT lines into a single system. It features a 1M+ token context window (922K input, 128K output) with support for...

Tested on 16 benchmarks with 59.0% average. Top scores: Chatbot Arena Elo — Overall (1465.8%), OTIS Mock AIME 2024-2025 (95.3%), ARC-AGI (93.7%).

Looking for similar performance at lower cost?
MiMo-V2-Flash scores 81.7 (98% as good) at $0.09/1M input · 96% cheaper

Capabilities

coding

67.2

#22 globally

reasoning

83.8

#4 globally

math

56.7

#56 globally

knowledge

50.0

#101 globally

agentic

35.9

#11 globally

speed

102.1

#1 globally

Benchmark Scores

Compare All

Tested on 16 benchmarks · Ranked across 7 categories

Score Distribution (all 233 models)

0255075100

▲ You are here

codingCompare coding →

SWE-Bench verified

Real-world software engineering tasks from GitHub issues. Models must diagnose bugs and write patches that pass test suites. Human-verified subset of SWE-bench.

76.9—

WeirdML

Unusual and adversarial machine learning challenges. Tests robustness of reasoning about edge cases in ML systems.

57.4—

reasoningCompare reasoning →

ARC-AGI

Abstraction and Reasoning Corpus. Tests fluid intelligence through novel visual pattern recognition puzzles. Core measure of general intelligence.

93.7—

ARC-AGI-2

ARC-AGI 2, harder sequel to ARC. More complex abstract reasoning patterns that test generalization ability beyond training data.

74.0—

mathCompare math →