Compare · ModelsLive · 2 picked · head to head

DeepSeek V3 vs R1 0528

Side by side · benchmarks, pricing, and signals you can act on.

Winner summary

R1 0528 wins 10 of 12 shared benchmarks. Leads in coding · arena · knowledge.

Category leads
coding·R1 0528arena·R1 0528knowledge·R1 0528language·DeepSeek V3math·R1 0528reasoning·DeepSeek V3
Hype vs Reality
DeepSeek V3
#45 by perf·no signal
QUIET
R1 0528
#53 by perf·no signal
QUIET
Best value
2.2x better value than R1 0528
DeepSeek V3
97.5 pts/$
$0.60/M
R1 0528
43.7 pts/$
$1.32/M
Vendor risk
One or more vendors flagged
DeepSeek logo
DeepSeek
$3.4B·Tier 1
Higher risk
DeepSeek logo
DeepSeek
$3.4B·Tier 1
Higher risk
Head to head
DeepSeek V3R1 0528
Aider polyglot
R1 0528 leads by +23.0
Aider Polyglot · measures how well AI models can edit code across multiple programming languages using the Aider coding assistant framework.
DeepSeek V3
48.4
R1 0528
71.4
Chatbot Arena Elo · Overall
R1 0528 leads by +63.6
DeepSeek V3
1358.2
R1 0528
1421.7
GPQA diamond
R1 0528 leads by +26.4
Graduate-Level Google-Proof QA (Diamond set) · expert-crafted questions in physics, biology, and chemistry that are difficult even for domain PhDs.
DeepSeek V3
42.0
R1 0528
68.4
HELM · GPQA
R1 0528 leads by +12.8
DeepSeek V3
53.8
R1 0528
66.6
HELM · IFEval
DeepSeek V3 leads by +4.8
DeepSeek V3
83.2
R1 0528
78.4
HELM · MMLU-Pro
R1 0528 leads by +7.0
DeepSeek V3
72.3
R1 0528
79.3
HELM · Omni-MATH
R1 0528 leads by +2.1
DeepSeek V3
40.3
R1 0528
42.4
HELM · WildBench
DeepSeek V3 leads by +0.3
DeepSeek V3
83.1
R1 0528
82.8
MATH level 5
R1 0528 leads by +31.8
MATH Level 5 · the hardest tier of the MATH benchmark, featuring competition-level problems from AMC, AIME, and Olympiad-style mathematics.
DeepSeek V3
64.8
R1 0528
96.6
OTIS Mock AIME 2024-2025
R1 0528 leads by +50.6
OTIS Mock AIME 2024-2025 · simulated American Invitational Mathematics Examination problems testing advanced problem-solving skills.
DeepSeek V3
15.8
R1 0528
66.4
SimpleBench
R1 0528 leads by +26.3
SimpleBench · tests fundamental reasoning capabilities with straightforward problems designed to expose gaps in basic logical and spatial thinking.
DeepSeek V3
2.7
R1 0528
29.0
WeirdML
R1 0528 leads by +5.6
WeirdML · tests models on unusual and adversarial machine learning tasks that require creative problem-solving beyond standard patterns.
DeepSeek V3
36.1
R1 0528
41.6
Full benchmark table
BenchmarkDeepSeek V3R1 0528
Aider polyglot
Aider Polyglot · measures how well AI models can edit code across multiple programming languages using the Aider coding assistant framework.
48.471.4
Chatbot Arena Elo · Overall
1358.21421.7
GPQA diamond
Graduate-Level Google-Proof QA (Diamond set) · expert-crafted questions in physics, biology, and chemistry that are difficult even for domain PhDs.
42.068.4
HELM · GPQA
53.866.6
HELM · IFEval
83.278.4
HELM · MMLU-Pro
72.379.3
HELM · Omni-MATH
40.342.4
HELM · WildBench
83.182.8
MATH level 5
MATH Level 5 · the hardest tier of the MATH benchmark, featuring competition-level problems from AMC, AIME, and Olympiad-style mathematics.
64.896.6
OTIS Mock AIME 2024-2025
OTIS Mock AIME 2024-2025 · simulated American Invitational Mathematics Examination problems testing advanced problem-solving skills.
15.866.4
SimpleBench
SimpleBench · tests fundamental reasoning capabilities with straightforward problems designed to expose gaps in basic logical and spatial thinking.
2.729.0
WeirdML
WeirdML · tests models on unusual and adversarial machine learning tasks that require creative problem-solving beyond standard patterns.
36.141.6
Pricing · per 1M tokens · projected $/mo at 10M tokens
ModelInputOutputContextProjected $/mo
DeepSeek logoDeepSeek V3$0.32$0.89164K tokens (~82 books)$4.63
DeepSeek logoR1 0528$0.50$2.15164K tokens (~82 books)$9.13