Compare · ModelsLive · 2 picked · head to head

Claude Opus 4 vs o1-preview

Side by side · benchmarks, pricing, and signals you can act on.

Winner summary

Claude Opus 4 wins 6 of 7 shared benchmarks. Leads in reasoning · coding · knowledge.

Category leads
reasoning·Claude Opus 4coding·Claude Opus 4knowledge·Claude Opus 4math·Claude Opus 4
Hype vs Reality
Claude Opus 4
#133 by perf·no signal
QUIET
o1-preview
#136 by perf·no signal
QUIET
Best value
Claude Opus 4
0.9 pts/$
$45.00/M
o1-preview
no price
Vendor risk
Anthropic logo
Anthropic
$380.0B·Tier 1
Medium risk
OpenAI logo
OpenAI
$840.0B·Tier 1
Medium risk
Head to head
Claude Opus 4o1-preview
ARC-AGI
Claude Opus 4 leads by +17.7
ARC-AGI · the original Abstraction and Reasoning Corpus, testing whether AI can solve novel visual pattern recognition tasks without memorization.
Claude Opus 4
35.7
o1-preview
18.0
Cybench
Claude Opus 4 leads by +28.0
Cybench · evaluates AI on real Capture-The-Flag cybersecurity challenges, testing vulnerability analysis, exploitation, and security reasoning.
Claude Opus 4
38.0
o1-preview
10.0
GPQA diamond
Claude Opus 4 leads by +34.6
Graduate-Level Google-Proof QA (Diamond set) · expert-crafted questions in physics, biology, and chemistry that are difficult even for domain PhDs.
Claude Opus 4
68.3
o1-preview
33.8
MATH level 5
Claude Opus 4 leads by +3.4
MATH Level 5 · the hardest tier of the MATH benchmark, featuring competition-level problems from AMC, AIME, and Olympiad-style mathematics.
Claude Opus 4
85.0
o1-preview
81.7
OTIS Mock AIME 2024-2025
Claude Opus 4 leads by +33.4
OTIS Mock AIME 2024-2025 · simulated American Invitational Mathematics Examination problems testing advanced problem-solving skills.
Claude Opus 4
64.4
o1-preview
31.0
SimpleBench
Claude Opus 4 leads by +20.5
SimpleBench · tests fundamental reasoning capabilities with straightforward problems designed to expose gaps in basic logical and spatial thinking.
Claude Opus 4
50.6
o1-preview
30.0
WeirdML
o1-preview leads by +4.2
WeirdML · tests models on unusual and adversarial machine learning tasks that require creative problem-solving beyond standard patterns.
Claude Opus 4
43.4
o1-preview
47.6
Full benchmark table
BenchmarkClaude Opus 4o1-preview
ARC-AGI
ARC-AGI · the original Abstraction and Reasoning Corpus, testing whether AI can solve novel visual pattern recognition tasks without memorization.
35.718.0
Cybench
Cybench · evaluates AI on real Capture-The-Flag cybersecurity challenges, testing vulnerability analysis, exploitation, and security reasoning.
38.010.0
GPQA diamond
Graduate-Level Google-Proof QA (Diamond set) · expert-crafted questions in physics, biology, and chemistry that are difficult even for domain PhDs.
68.333.8
MATH level 5
MATH Level 5 · the hardest tier of the MATH benchmark, featuring competition-level problems from AMC, AIME, and Olympiad-style mathematics.
85.081.7
OTIS Mock AIME 2024-2025
OTIS Mock AIME 2024-2025 · simulated American Invitational Mathematics Examination problems testing advanced problem-solving skills.
64.431.0
SimpleBench
SimpleBench · tests fundamental reasoning capabilities with straightforward problems designed to expose gaps in basic logical and spatial thinking.
50.630.0
WeirdML
WeirdML · tests models on unusual and adversarial machine learning tasks that require creative problem-solving beyond standard patterns.
43.447.6
Pricing · per 1M tokens · projected $/mo at 10M tokens
ModelInputOutputContextProjected $/mo
Anthropic logoClaude Opus 4$15.00$75.00200K tokens (~100 books)$300.00
OpenAI logoo1-preview