Compare · ModelsLive · 2 picked · head to head

Claude Opus 4.1 vs R1

Side by side · benchmarks, pricing, and signals you can act on.

CiteAdd another

Winner summary

Claude Opus 4.1 wins on 7/7 benchmarks

Claude Opus 4.1 wins 7 of 7 shared benchmarks. Leads in knowledge · math · reasoning.

Claude Opus 4.1

7 / 7

0 / 7

Category leads

knowledge·Claude Opus 4.1math·Claude Opus 4.1reasoning·Claude Opus 4.1coding·Claude Opus 4.1

Hype vs Reality

Attention vs performance

Claude Opus 4.1

#137 by perf·no signal

QUIET

#116 by perf·no signal

QUIET

See full mindshare →

Best value

30.7x better value than Claude Opus 4.1

Claude Opus 4.1

0.9 pts/$

$45.00/M

28.2 pts/$

$1.60/M

Explore pricing →

Vendor risk

Mixed exposure

One or more vendors flagged

Anthropic

$380.0B·Tier 1

Medium risk

DeepSeek

$3.4B·Tier 1

Higher risk

See the AI economy →

Head to head

7 benchmarks · 2 models

Claude Opus 4.1R1

DeepResearch Bench

Claude Opus 4.1 leads by +14.6

DeepResearch Bench · evaluates AI on complex multi-step research tasks requiring information gathering, synthesis, and producing comprehensive analyses.

Claude Opus 4.1

49.7

35.1

GPQA diamond

Claude Opus 4.1 leads by +7.4

Graduate-Level Google-Proof QA (Diamond set) · expert-crafted questions in physics, biology, and chemistry that are difficult even for domain PhDs.

Claude Opus 4.1

69.7

62.3

Lech Mazur Writing

Claude Opus 4.1 leads by +2.4

Lech Mazur Writing · evaluates creative writing ability, assessing prose quality, narrative coherence, and stylistic sophistication.

Claude Opus 4.1

85.4

83.0

OTIS Mock AIME 2024-2025

Claude Opus 4.1 leads by +15.6

OTIS Mock AIME 2024-2025 · simulated American Invitational Mathematics Examination problems testing advanced problem-solving skills.

Claude Opus 4.1

68.9

53.3

SimpleBench

Claude Opus 4.1 leads by +34.9

SimpleBench · tests fundamental reasoning capabilities with straightforward problems designed to expose gaps in basic logical and spatial thinking.

Claude Opus 4.1

52.0

17.1

SimpleQA Verified

Claude Opus 4.1 leads by +7.4

SimpleQA Verified · short factual questions with verified answers, measuring factual accuracy and the tendency to hallucinate or provide incorrect information.

Claude Opus 4.1

34.8

27.4

WeirdML

Claude Opus 4.1 leads by +6.3

WeirdML · tests models on unusual and adversarial machine learning tasks that require creative problem-solving beyond standard patterns.

Claude Opus 4.1

42.8

36.5

Full benchmark table

Benchmark	Claude Opus 4.1	R1
DeepResearch Bench DeepResearch Bench · evaluates AI on complex multi-step research tasks requiring information gathering, synthesis, and producing comprehensive analyses.	49.7	35.1
GPQA diamond Graduate-Level Google-Proof QA (Diamond set) · expert-crafted questions in physics, biology, and chemistry that are difficult even for domain PhDs.	69.7	62.3
Lech Mazur Writing Lech Mazur Writing · evaluates creative writing ability, assessing prose quality, narrative coherence, and stylistic sophistication.	85.4	83.0
OTIS Mock AIME 2024-2025 OTIS Mock AIME 2024-2025 · simulated American Invitational Mathematics Examination problems testing advanced problem-solving skills.	68.9	53.3
SimpleBench SimpleBench · tests fundamental reasoning capabilities with straightforward problems designed to expose gaps in basic logical and spatial thinking.	52.0	17.1
SimpleQA Verified SimpleQA Verified · short factual questions with verified answers, measuring factual accuracy and the tendency to hallucinate or provide incorrect information.	34.8	27.4
WeirdML WeirdML · tests models on unusual and adversarial machine learning tasks that require creative problem-solving beyond standard patterns.	42.8	36.5

Pricing · per 1M tokens · projected $/mo at 10M tokens

Model	Input	Output	Context	Projected $/mo
Claude Opus 4.1	$15.00	$75.00	200K tokens (~100 books)	$300.00
R1	$0.70	$2.50	64K tokens (~32 books)	$11.50

People also compared

R1 vs o3 R1 vs o1