Beta
Compare · ModelsLive · 2 picked · head to head

Claude Mythos Preview vs Claude Mythos Preview

Side by side · benchmarks, pricing, and signals you can act on.

Winner summary

Claude Mythos Preview wins 14 of 14 shared benchmarks. Leads in reasoning · knowledge · agentic.

Category leads
reasoning·Claude Mythos Previewknowledge·Claude Mythos Previewagentic·Claude Mythos Previewcoding·Claude Mythos Previewmath·Claude Mythos Preview
Hype vs Reality
Claude Mythos Preview
#2 by perf·#2 by attention
DESERVED
Claude Mythos Preview
#2 by perf·#2 by attention
DESERVED
Best value
Claude Mythos Preview
no price
Claude Mythos Preview
no price
Vendor risk
Anthropic logo
Anthropic
$380.0B·Tier 1
Medium risk
Anthropic logo
Anthropic
$380.0B·Tier 1
Medium risk
Head to head
Claude Mythos PreviewClaude Mythos Preview
CharXiv Reasoning
Claude Mythos Preview
86.1
Claude Mythos Preview
86.1
CharXiv Reasoning (with tools)
Claude Mythos Preview
93.2
Claude Mythos Preview
93.2
GPQA diamond
Graduate-Level Google-Proof QA (Diamond set) · expert-crafted questions in physics, biology, and chemistry that are difficult even for domain PhDs.
Claude Mythos Preview
94.5
Claude Mythos Preview
94.5
GraphWalks BFS 256K-1M
Claude Mythos Preview
80.0
Claude Mythos Preview
80.0
HLE
HLE (Humanity's Last Exam) · crowdsourced expert-level questions designed to be among the hardest possible challenges for AI systems across all domains.
Claude Mythos Preview
56.8
Claude Mythos Preview
56.8
HLE (with tools)
Claude Mythos Preview
64.7
Claude Mythos Preview
64.7
MMMLU
Claude Mythos Preview
92.7
Claude Mythos Preview
92.7
OSWorld
OSWorld · tests AI agents on real-world computer tasks across operating systems, including web browsing, file management, and application use.
Claude Mythos Preview
79.6
Claude Mythos Preview
79.6
SWE-bench Multilingual
Claude Mythos Preview
87.3
Claude Mythos Preview
87.3
SWE-bench Multimodal
Claude Mythos Preview
59.0
Claude Mythos Preview
59.0
SWE-bench Pro
Claude Mythos Preview
77.8
Claude Mythos Preview
77.8
SWE-Bench verified
Claude Mythos Preview
93.9
Claude Mythos Preview
93.9
Terminal Bench
Terminal Bench · tests the ability to accomplish real-world tasks using terminal commands, evaluating shell scripting and CLI tool proficiency.
Claude Mythos Preview
82.0
Claude Mythos Preview
82.0
USAMO
Claude Mythos Preview
97.6
Claude Mythos Preview
97.6
Full benchmark table
BenchmarkClaude Mythos PreviewClaude Mythos Preview
CharXiv Reasoning
86.186.1
CharXiv Reasoning (with tools)
93.293.2
GPQA diamond
Graduate-Level Google-Proof QA (Diamond set) · expert-crafted questions in physics, biology, and chemistry that are difficult even for domain PhDs.
94.594.5
GraphWalks BFS 256K-1M
80.080.0
HLE
HLE (Humanity's Last Exam) · crowdsourced expert-level questions designed to be among the hardest possible challenges for AI systems across all domains.
56.856.8
HLE (with tools)
64.764.7
MMMLU
92.792.7
OSWorld
OSWorld · tests AI agents on real-world computer tasks across operating systems, including web browsing, file management, and application use.
79.679.6
SWE-bench Multilingual
87.387.3
SWE-bench Multimodal
59.059.0
SWE-bench Pro
77.877.8
SWE-Bench verified
93.993.9
Terminal Bench
Terminal Bench · tests the ability to accomplish real-world tasks using terminal commands, evaluating shell scripting and CLI tool proficiency.
82.082.0
USAMO
97.697.6
Pricing · per 1M tokens · projected $/mo at 10M tokens
ModelInputOutputContextProjected $/mo
Anthropic logoClaude Mythos Preview1.0M tokens (~500 books)
Anthropic logoClaude Mythos Preview1.0M tokens (~500 books)