Compare · ModelsLive · 2 picked · head to head
DeepSeek V3 vs R1 0528
Side by side · benchmarks, pricing, and signals you can act on.
Winner summary
R1 0528 wins on 10/12 benchmarks
R1 0528 wins 10 of 12 shared benchmarks. Leads in coding · arena · knowledge.
Category leads
coding·R1 0528arena·R1 0528knowledge·R1 0528language·DeepSeek V3math·R1 0528reasoning·DeepSeek V3
Hype vs Reality
Attention vs performance
DeepSeek V3
#45 by perf·no signal
R1 0528
#53 by perf·no signal
Best value
DeepSeek V3
2.2x better value than R1 0528
DeepSeek V3
97.5 pts/$
$0.60/M
R1 0528
43.7 pts/$
$1.32/M
Vendor risk
Mixed exposure
One or more vendors flagged
DeepSeek
$3.4B·Tier 1
DeepSeek
$3.4B·Tier 1
Head to head
12 benchmarks · 2 models
DeepSeek V3R1 0528
Aider polyglot
R1 0528 leads by +23.0
Aider Polyglot · measures how well AI models can edit code across multiple programming languages using the Aider coding assistant framework.
DeepSeek V3
48.4
R1 0528
71.4
Chatbot Arena Elo · Overall
R1 0528 leads by +63.6
DeepSeek V3
1358.2
R1 0528
1421.7
GPQA diamond
R1 0528 leads by +26.4
Graduate-Level Google-Proof QA (Diamond set) · expert-crafted questions in physics, biology, and chemistry that are difficult even for domain PhDs.
DeepSeek V3
42.0
R1 0528
68.4
HELM · GPQA
R1 0528 leads by +12.8
DeepSeek V3
53.8
R1 0528
66.6
HELM · IFEval
DeepSeek V3 leads by +4.8
DeepSeek V3
83.2
R1 0528
78.4
HELM · MMLU-Pro
R1 0528 leads by +7.0
DeepSeek V3
72.3
R1 0528
79.3
HELM · Omni-MATH
R1 0528 leads by +2.1
DeepSeek V3
40.3
R1 0528
42.4
HELM · WildBench
DeepSeek V3 leads by +0.3
DeepSeek V3
83.1
R1 0528
82.8
MATH level 5
R1 0528 leads by +31.8
MATH Level 5 · the hardest tier of the MATH benchmark, featuring competition-level problems from AMC, AIME, and Olympiad-style mathematics.
DeepSeek V3
64.8
R1 0528
96.6
OTIS Mock AIME 2024-2025
R1 0528 leads by +50.6
OTIS Mock AIME 2024-2025 · simulated American Invitational Mathematics Examination problems testing advanced problem-solving skills.
DeepSeek V3
15.8
R1 0528
66.4
SimpleBench
R1 0528 leads by +26.3
SimpleBench · tests fundamental reasoning capabilities with straightforward problems designed to expose gaps in basic logical and spatial thinking.
DeepSeek V3
2.7
R1 0528
29.0
WeirdML
R1 0528 leads by +5.6
WeirdML · tests models on unusual and adversarial machine learning tasks that require creative problem-solving beyond standard patterns.
DeepSeek V3
36.1
R1 0528
41.6
Full benchmark table
| Benchmark | DeepSeek V3 | R1 0528 |
|---|---|---|
Aider polyglot Aider Polyglot · measures how well AI models can edit code across multiple programming languages using the Aider coding assistant framework. | 48.4 | 71.4 |
Chatbot Arena Elo · Overall | 1358.2 | 1421.7 |
GPQA diamond Graduate-Level Google-Proof QA (Diamond set) · expert-crafted questions in physics, biology, and chemistry that are difficult even for domain PhDs. | 42.0 | 68.4 |
HELM · GPQA | 53.8 | 66.6 |
HELM · IFEval | 83.2 | 78.4 |
HELM · MMLU-Pro | 72.3 | 79.3 |
HELM · Omni-MATH | 40.3 | 42.4 |
HELM · WildBench | 83.1 | 82.8 |
MATH level 5 MATH Level 5 · the hardest tier of the MATH benchmark, featuring competition-level problems from AMC, AIME, and Olympiad-style mathematics. | 64.8 | 96.6 |
OTIS Mock AIME 2024-2025 OTIS Mock AIME 2024-2025 · simulated American Invitational Mathematics Examination problems testing advanced problem-solving skills. | 15.8 | 66.4 |
SimpleBench SimpleBench · tests fundamental reasoning capabilities with straightforward problems designed to expose gaps in basic logical and spatial thinking. | 2.7 | 29.0 |
WeirdML WeirdML · tests models on unusual and adversarial machine learning tasks that require creative problem-solving beyond standard patterns. | 36.1 | 41.6 |
Pricing · per 1M tokens · projected $/mo at 10M tokens
| Model | Input | Output | Context | Projected $/mo |
|---|---|---|---|---|
| $0.32 | $0.89 | 164K tokens (~82 books) | $4.63 | |
| $0.50 | $2.15 | 164K tokens (~82 books) | $9.13 |
People also compared