Compare · ModelsLive · 2 picked · head to head
DeepSeek V3 vs Gemini 1.5 Pro (Feb 2024)
Side by side · benchmarks, pricing, and signals you can act on.
Winner summary
DeepSeek V3 wins on 10/13 benchmarks
DeepSeek V3 wins 10 of 13 shared benchmarks. Leads in arena · reasoning · knowledge.
Category leads
arena·DeepSeek V3reasoning·DeepSeek V3knowledge·DeepSeek V3language·Gemini 1.5 Pro (Feb 2024)math·DeepSeek V3coding·DeepSeek V3
Hype vs Reality
Attention vs performance
DeepSeek V3
#45 by perf·no signal
Gemini 1.5 Pro (Feb 2024)
#138 by perf·no signal
Best value
DeepSeek V3
DeepSeek V3
97.5 pts/$
$0.60/M
Gemini 1.5 Pro (Feb 2024)
—
no price
Vendor risk
Mixed exposure
One or more vendors flagged
DeepSeek
$3.4B·Tier 1
Google DeepMind
$4.00T·Tier 1
Head to head
13 benchmarks · 2 models
DeepSeek V3Gemini 1.5 Pro (Feb 2024)
Chatbot Arena Elo · Overall
DeepSeek V3 leads by +35.6
DeepSeek V3
1358.2
Gemini 1.5 Pro (Feb 2024)
1322.5
BBH
DeepSeek V3 leads by +4.7
BIG-Bench Hard · a curated subset of 23 challenging tasks from BIG-Bench where language models previously failed to outperform average humans.
DeepSeek V3
83.3
Gemini 1.5 Pro (Feb 2024)
78.7
GPQA diamond
DeepSeek V3 leads by +14.2
Graduate-Level Google-Proof QA (Diamond set) · expert-crafted questions in physics, biology, and chemistry that are difficult even for domain PhDs.
DeepSeek V3
42.0
Gemini 1.5 Pro (Feb 2024)
27.8
HELM · GPQA
DeepSeek V3 leads by +0.4
DeepSeek V3
53.8
Gemini 1.5 Pro (Feb 2024)
53.4
HELM · IFEval
Gemini 1.5 Pro (Feb 2024) leads by +0.5
DeepSeek V3
83.2
Gemini 1.5 Pro (Feb 2024)
83.7
HELM · MMLU-Pro
Gemini 1.5 Pro (Feb 2024) leads by +1.4
DeepSeek V3
72.3
Gemini 1.5 Pro (Feb 2024)
73.7
HELM · Omni-MATH
DeepSeek V3 leads by +3.9
DeepSeek V3
40.3
Gemini 1.5 Pro (Feb 2024)
36.4
HELM · WildBench
DeepSeek V3 leads by +1.8
DeepSeek V3
83.1
Gemini 1.5 Pro (Feb 2024)
81.3
MATH level 5
DeepSeek V3 leads by +24.1
MATH Level 5 · the hardest tier of the MATH benchmark, featuring competition-level problems from AMC, AIME, and Olympiad-style mathematics.
DeepSeek V3
64.8
Gemini 1.5 Pro (Feb 2024)
40.8
MMLU
DeepSeek V3 leads by +6.0
Massive Multitask Language Understanding · 57 subjects spanning STEM, humanities, social sciences, and more. The standard benchmark for broad knowledge.
DeepSeek V3
82.9
Gemini 1.5 Pro (Feb 2024)
76.9
OTIS Mock AIME 2024-2025
DeepSeek V3 leads by +9.0
OTIS Mock AIME 2024-2025 · simulated American Invitational Mathematics Examination problems testing advanced problem-solving skills.
DeepSeek V3
15.8
Gemini 1.5 Pro (Feb 2024)
6.7
SimpleBench
Gemini 1.5 Pro (Feb 2024) leads by +9.8
SimpleBench · tests fundamental reasoning capabilities with straightforward problems designed to expose gaps in basic logical and spatial thinking.
DeepSeek V3
2.7
Gemini 1.5 Pro (Feb 2024)
12.5
WeirdML
DeepSeek V3 leads by +13.9
WeirdML · tests models on unusual and adversarial machine learning tasks that require creative problem-solving beyond standard patterns.
DeepSeek V3
36.1
Gemini 1.5 Pro (Feb 2024)
22.2
Full benchmark table
| Benchmark | DeepSeek V3 | Gemini 1.5 Pro (Feb 2024) |
|---|---|---|
Chatbot Arena Elo · Overall | 1358.2 | 1322.5 |
BBH BIG-Bench Hard · a curated subset of 23 challenging tasks from BIG-Bench where language models previously failed to outperform average humans. | 83.3 | 78.7 |
GPQA diamond Graduate-Level Google-Proof QA (Diamond set) · expert-crafted questions in physics, biology, and chemistry that are difficult even for domain PhDs. | 42.0 | 27.8 |
HELM · GPQA | 53.8 | 53.4 |
HELM · IFEval | 83.2 | 83.7 |
HELM · MMLU-Pro | 72.3 | 73.7 |
HELM · Omni-MATH | 40.3 | 36.4 |
HELM · WildBench | 83.1 | 81.3 |
MATH level 5 MATH Level 5 · the hardest tier of the MATH benchmark, featuring competition-level problems from AMC, AIME, and Olympiad-style mathematics. | 64.8 | 40.8 |
MMLU Massive Multitask Language Understanding · 57 subjects spanning STEM, humanities, social sciences, and more. The standard benchmark for broad knowledge. | 82.9 | 76.9 |
OTIS Mock AIME 2024-2025 OTIS Mock AIME 2024-2025 · simulated American Invitational Mathematics Examination problems testing advanced problem-solving skills. | 15.8 | 6.7 |
SimpleBench SimpleBench · tests fundamental reasoning capabilities with straightforward problems designed to expose gaps in basic logical and spatial thinking. | 2.7 | 12.5 |
WeirdML WeirdML · tests models on unusual and adversarial machine learning tasks that require creative problem-solving beyond standard patterns. | 36.1 | 22.2 |
Pricing · per 1M tokens · projected $/mo at 10M tokens
| Model | Input | Output | Context | Projected $/mo |
|---|---|---|---|---|
| $0.32 | $0.89 | 164K tokens (~82 books) | $4.63 | |
| — | — | — | — |