Compare · ModelsLive · 2 picked · head to head

Claude Mythos Preview vs Claude Sonnet 4.6

Side by side · benchmarks, pricing, and signals you can act on.

CiteAdd another

Winner summary

Claude Mythos Preview wins on 2/2 benchmarks

Claude Mythos Preview wins 2 of 2 shared benchmarks. Leads in knowledge · coding.

Claude Mythos Preview

2 / 2

Claude Sonnet 4.6

0 / 2

Category leads

knowledge·Claude Mythos Previewcoding·Claude Mythos Preview

Hype vs Reality

Attention vs performance

Claude Mythos Preview

#4 by perf·#2 by attention

DESERVED

Claude Sonnet 4.6

#104 by perf·#18 by attention

UNDERRATED

See full mindshare →

Best value

Claude Sonnet 4.6

Claude Mythos Preview

—

no price

Claude Sonnet 4.6

5.3 pts/$

$9.00/M

Explore pricing →

Vendor risk

Who is behind the model

Anthropic

$380.0B·Tier 1

Medium risk

Anthropic

$380.0B·Tier 1

Medium risk

See the AI economy →

Head to head

2 benchmarks · 2 models

Claude Mythos PreviewClaude Sonnet 4.6

GPQA diamond

Claude Mythos Preview leads by +11.3

Graduate-Level Google-Proof QA (Diamond set) · expert-crafted questions in physics, biology, and chemistry that are difficult even for domain PhDs.

Claude Mythos Preview

94.5

Claude Sonnet 4.6

83.2

SWE-Bench verified

Claude Mythos Preview leads by +18.7

SWE-bench Verified · 500 human-validated tasks from 12 real Python repositories (Django, Flask, scikit-learn, sympy, and others). Each task requires the model to produce a git patch that resolves a real GitHub issue and passes the test suite. The verified subset eliminates ambiguous tasks from the original SWE-bench. Claude Mythos Preview leads at 93.9%, crossing 90% for the first time in 2026. Opus 4.6 scores 80.8%. The benchmark remains the most-cited evaluation for code-generation capability.

Claude Mythos Preview

93.9

Claude Sonnet 4.6

75.2

Full benchmark table

Benchmark	Claude Mythos Preview	Claude Sonnet 4.6
GPQA diamond Graduate-Level Google-Proof QA (Diamond set) · expert-crafted questions in physics, biology, and chemistry that are difficult even for domain PhDs.	94.5	83.2
SWE-Bench verified SWE-bench Verified · 500 human-validated tasks from 12 real Python repositories (Django, Flask, scikit-learn, sympy, and others). Each task requires the model to produce a git patch that resolves a real GitHub issue and passes the test suite. The verified subset eliminates ambiguous tasks from the original SWE-bench. Claude Mythos Preview leads at 93.9%, crossing 90% for the first time in 2026. Opus 4.6 scores 80.8%. The benchmark remains the most-cited evaluation for code-generation capability.	93.9	75.2

Pricing · per 1M tokens · projected $/mo at 10M tokens

Model	Input	Output	Context	Projected $/mo
Claude Mythos Preview	—	—	1.0M tokens (~500 books)	—
Claude Sonnet 4.6	$3.00	$15.00	1.0M tokens (~500 books)	$60.00

People also compared

Claude Mythos Preview vs GPT-5.5 Claude Mythos Preview vs Claude Opus 4.6 Claude Mythos Preview vs GPT-5.4 Claude Mythos Preview vs Gemini 3.1 Pro Preview Claude Mythos Preview vs o3 Pro Claude Sonnet 4.6 vs GPT-5.2 Claude Mythos Preview vs GPT-5.5 Pro Claude Mythos Preview vs GPT-5 Chat