Compare · ModelsLive · 2 picked · head to head

GPT-5.4 vs Claude Mythos Preview

Side by side · benchmarks, pricing, and signals you can act on.

CiteAdd another

Winner summary

Claude Mythos Preview wins on 2/2 benchmarks

Claude Mythos Preview wins 2 of 2 shared benchmarks. Leads in knowledge · coding.

GPT-5.4

0 / 2

Claude Mythos Preview

2 / 2

Category leads

knowledge·Claude Mythos Previewcoding·Claude Mythos Preview

Hype vs Reality

Attention vs performance

GPT-5.4

#46 by perf·no signal

QUIET

Claude Mythos Preview

#4 by perf·#2 by attention

DESERVED

See full mindshare →

Best value

GPT-5.4

6.7 pts/$

$8.75/M

Claude Mythos Preview

—

no price

Explore pricing →

Vendor risk

Who is behind the model

OpenAI

$840.0B·Tier 1

Medium risk

Anthropic

$380.0B·Tier 1

Medium risk

See the AI economy →

Head to head

2 benchmarks · 2 models

GPT-5.4Claude Mythos Preview

GPQA diamond

Claude Mythos Preview leads by +3.4

Graduate-Level Google-Proof QA (Diamond set) · expert-crafted questions in physics, biology, and chemistry that are difficult even for domain PhDs.

GPT-5.4

91.1

Claude Mythos Preview

94.5

SWE-Bench verified

Claude Mythos Preview leads by +17.0

SWE-bench Verified · 500 human-validated tasks from 12 real Python repositories (Django, Flask, scikit-learn, sympy, and others). Each task requires the model to produce a git patch that resolves a real GitHub issue and passes the test suite. The verified subset eliminates ambiguous tasks from the original SWE-bench. Claude Mythos Preview leads at 93.9%, crossing 90% for the first time in 2026. Opus 4.6 scores 80.8%. The benchmark remains the most-cited evaluation for code-generation capability.

GPT-5.4

76.9

Claude Mythos Preview

93.9

Full benchmark table

Benchmark	GPT-5.4	Claude Mythos Preview
GPQA diamond Graduate-Level Google-Proof QA (Diamond set) · expert-crafted questions in physics, biology, and chemistry that are difficult even for domain PhDs.	91.1	94.5
SWE-Bench verified SWE-bench Verified · 500 human-validated tasks from 12 real Python repositories (Django, Flask, scikit-learn, sympy, and others). Each task requires the model to produce a git patch that resolves a real GitHub issue and passes the test suite. The verified subset eliminates ambiguous tasks from the original SWE-bench. Claude Mythos Preview leads at 93.9%, crossing 90% for the first time in 2026. Opus 4.6 scores 80.8%. The benchmark remains the most-cited evaluation for code-generation capability.	76.9	93.9

Pricing · per 1M tokens · projected $/mo at 10M tokens

Model	Input	Output	Context	Projected $/mo
GPT-5.4	$2.50	$15.00	1.1M tokens (~525 books)	$56.25
Claude Mythos Preview	—	—	1.0M tokens (~500 books)	—

People also compared

Claude Mythos Preview vs GPT-5.5 GPT-5.4 vs GPT-5.5 Claude Mythos Preview vs Claude Opus 4.6 Claude Mythos Preview vs Gemini 3.1 Pro Preview Claude Mythos Preview vs o3 Pro Claude Opus 4.6 vs GPT-5.4 GPT-5.4 vs o3 Pro DeepSeek V3.2 Exp vs GPT-5.4