Compare · ModelsLive · 2 picked · head to head

o1-preview vs Claude Opus 4

Side by side · benchmarks, pricing, and signals you can act on.

Winner summary

Claude Opus 4 wins 6 of 7 shared benchmarks. Leads in reasoning · coding · knowledge.

Category leads
reasoning·Claude Opus 4coding·Claude Opus 4knowledge·Claude Opus 4math·Claude Opus 4
Hype vs Reality
o1-preview
#136 by perf·no signal
QUIET
Claude Opus 4
#133 by perf·no signal
QUIET
Best value
o1-preview
no price
Claude Opus 4
0.9 pts/$
$45.00/M
Vendor risk
OpenAI logo
OpenAI
$840.0B·Tier 1
Medium risk
Anthropic logo
Anthropic
$380.0B·Tier 1
Medium risk
Head to head
o1-previewClaude Opus 4
ARC-AGI
Claude Opus 4 leads by +17.7
ARC-AGI · the original Abstraction and Reasoning Corpus, testing whether AI can solve novel visual pattern recognition tasks without memorization.
o1-preview
18.0
Claude Opus 4
35.7
Cybench
Claude Opus 4 leads by +28.0
Cybench · evaluates AI on real Capture-The-Flag cybersecurity challenges, testing vulnerability analysis, exploitation, and security reasoning.
o1-preview
10.0
Claude Opus 4
38.0
GPQA diamond
Claude Opus 4 leads by +34.6
Graduate-Level Google-Proof QA (Diamond set) · expert-crafted questions in physics, biology, and chemistry that are difficult even for domain PhDs.
o1-preview
33.8
Claude Opus 4
68.3
MATH level 5
Claude Opus 4 leads by +3.4
MATH Level 5 · the hardest tier of the MATH benchmark, featuring competition-level problems from AMC, AIME, and Olympiad-style mathematics.
o1-preview
81.7
Claude Opus 4
85.0
OTIS Mock AIME 2024-2025
Claude Opus 4 leads by +33.4
OTIS Mock AIME 2024-2025 · simulated American Invitational Mathematics Examination problems testing advanced problem-solving skills.
o1-preview
31.0
Claude Opus 4
64.4
SimpleBench
Claude Opus 4 leads by +20.5
SimpleBench · tests fundamental reasoning capabilities with straightforward problems designed to expose gaps in basic logical and spatial thinking.
o1-preview
30.0
Claude Opus 4
50.6
WeirdML
o1-preview leads by +4.2
WeirdML · tests models on unusual and adversarial machine learning tasks that require creative problem-solving beyond standard patterns.
o1-preview
47.6
Claude Opus 4
43.4
Full benchmark table
Benchmarko1-previewClaude Opus 4
ARC-AGI
ARC-AGI · the original Abstraction and Reasoning Corpus, testing whether AI can solve novel visual pattern recognition tasks without memorization.
18.035.7
Cybench
Cybench · evaluates AI on real Capture-The-Flag cybersecurity challenges, testing vulnerability analysis, exploitation, and security reasoning.
10.038.0
GPQA diamond
Graduate-Level Google-Proof QA (Diamond set) · expert-crafted questions in physics, biology, and chemistry that are difficult even for domain PhDs.
33.868.3
MATH level 5
MATH Level 5 · the hardest tier of the MATH benchmark, featuring competition-level problems from AMC, AIME, and Olympiad-style mathematics.
81.785.0
OTIS Mock AIME 2024-2025
OTIS Mock AIME 2024-2025 · simulated American Invitational Mathematics Examination problems testing advanced problem-solving skills.
31.064.4
SimpleBench
SimpleBench · tests fundamental reasoning capabilities with straightforward problems designed to expose gaps in basic logical and spatial thinking.
30.050.6
WeirdML
WeirdML · tests models on unusual and adversarial machine learning tasks that require creative problem-solving beyond standard patterns.
47.643.4
Pricing · per 1M tokens · projected $/mo at 10M tokens
ModelInputOutputContextProjected $/mo
OpenAI logoo1-preview
Anthropic logoClaude Opus 4$15.00$75.00200K tokens (~100 books)$300.00