Compare · ModelsLive · 2 picked · head to head

DeepSeek V3 vs Gemini 1.5 Pro (Feb 2024)

Side by side · benchmarks, pricing, and signals you can act on.

CiteAdd another

Winner summary

DeepSeek V3 wins on 10/13 benchmarks

DeepSeek V3 wins 10 of 13 shared benchmarks. Leads in arena · reasoning · knowledge.

DeepSeek V3

10 / 13

Gemini 1.5 Pro (Feb 2024)

3 / 13

Category leads

arena·DeepSeek V3reasoning·DeepSeek V3knowledge·DeepSeek V3language·Gemini 1.5 Pro (Feb 2024)math·DeepSeek V3coding·DeepSeek V3

Hype vs Reality

Attention vs performance

DeepSeek V3

#45 by perf·no signal

QUIET

Gemini 1.5 Pro (Feb 2024)

#138 by perf·no signal

QUIET

See full mindshare →

Best value

DeepSeek V3

97.5 pts/$

$0.60/M

Gemini 1.5 Pro (Feb 2024)

—

no price

Explore pricing →

Vendor risk

Mixed exposure

One or more vendors flagged

DeepSeek

$3.4B·Tier 1

Higher risk

Google DeepMind

$4.00T·Tier 1

Low risk

See the AI economy →

Head to head

13 benchmarks · 2 models

DeepSeek V3Gemini 1.5 Pro (Feb 2024)

Chatbot Arena Elo · Overall

DeepSeek V3 leads by +35.6

DeepSeek V3

1358.2

Gemini 1.5 Pro (Feb 2024)

1322.5

BBH

DeepSeek V3 leads by +4.7

BIG-Bench Hard · a curated subset of 23 challenging tasks from BIG-Bench where language models previously failed to outperform average humans.

DeepSeek V3

83.3

Gemini 1.5 Pro (Feb 2024)

78.7

GPQA diamond

DeepSeek V3 leads by +14.2

Graduate-Level Google-Proof QA (Diamond set) · expert-crafted questions in physics, biology, and chemistry that are difficult even for domain PhDs.

DeepSeek V3

42.0

Gemini 1.5 Pro (Feb 2024)

27.8

HELM · GPQA

DeepSeek V3 leads by +0.4

DeepSeek V3

53.8

Gemini 1.5 Pro (Feb 2024)

53.4

HELM · IFEval

Gemini 1.5 Pro (Feb 2024) leads by +0.5

DeepSeek V3

83.2

Gemini 1.5 Pro (Feb 2024)

83.7

HELM · MMLU-Pro

Gemini 1.5 Pro (Feb 2024) leads by +1.4

DeepSeek V3

72.3

Gemini 1.5 Pro (Feb 2024)

73.7

HELM · Omni-MATH

DeepSeek V3 leads by +3.9

DeepSeek V3

40.3

Gemini 1.5 Pro (Feb 2024)

36.4

HELM · WildBench

DeepSeek V3 leads by +1.8

DeepSeek V3

83.1

Gemini 1.5 Pro (Feb 2024)

81.3

MATH level 5

DeepSeek V3 leads by +24.1

MATH Level 5 · the hardest tier of the MATH benchmark, featuring competition-level problems from AMC, AIME, and Olympiad-style mathematics.

DeepSeek V3

64.8

Gemini 1.5 Pro (Feb 2024)

40.8

MMLU

DeepSeek V3 leads by +6.0

Massive Multitask Language Understanding · 57 subjects spanning STEM, humanities, social sciences, and more. The standard benchmark for broad knowledge.

DeepSeek V3

82.9

Gemini 1.5 Pro (Feb 2024)

76.9

OTIS Mock AIME 2024-2025

DeepSeek V3 leads by +9.0

OTIS Mock AIME 2024-2025 · simulated American Invitational Mathematics Examination problems testing advanced problem-solving skills.

DeepSeek V3

15.8

Gemini 1.5 Pro (Feb 2024)

6.7

SimpleBench

Gemini 1.5 Pro (Feb 2024) leads by +9.8

SimpleBench · tests fundamental reasoning capabilities with straightforward problems designed to expose gaps in basic logical and spatial thinking.

DeepSeek V3

2.7

Gemini 1.5 Pro (Feb 2024)

12.5

WeirdML

DeepSeek V3 leads by +13.9

WeirdML · tests models on unusual and adversarial machine learning tasks that require creative problem-solving beyond standard patterns.

DeepSeek V3

36.1

Gemini 1.5 Pro (Feb 2024)

22.2

Full benchmark table

Benchmark	DeepSeek V3	Gemini 1.5 Pro (Feb 2024)
Chatbot Arena Elo · Overall	1358.2	1322.5
BBH BIG-Bench Hard · a curated subset of 23 challenging tasks from BIG-Bench where language models previously failed to outperform average humans.	83.3	78.7
GPQA diamond Graduate-Level Google-Proof QA (Diamond set) · expert-crafted questions in physics, biology, and chemistry that are difficult even for domain PhDs.	42.0	27.8
HELM · GPQA	53.8	53.4
HELM · IFEval	83.2	83.7
HELM · MMLU-Pro	72.3	73.7
HELM · Omni-MATH	40.3	36.4
HELM · WildBench	83.1	81.3
MATH level 5 MATH Level 5 · the hardest tier of the MATH benchmark, featuring competition-level problems from AMC, AIME, and Olympiad-style mathematics.	64.8	40.8
MMLU Massive Multitask Language Understanding · 57 subjects spanning STEM, humanities, social sciences, and more. The standard benchmark for broad knowledge.	82.9	76.9
OTIS Mock AIME 2024-2025 OTIS Mock AIME 2024-2025 · simulated American Invitational Mathematics Examination problems testing advanced problem-solving skills.	15.8	6.7
SimpleBench SimpleBench · tests fundamental reasoning capabilities with straightforward problems designed to expose gaps in basic logical and spatial thinking.	2.7	12.5
WeirdML WeirdML · tests models on unusual and adversarial machine learning tasks that require creative problem-solving beyond standard patterns.	36.1	22.2

Pricing · per 1M tokens · projected $/mo at 10M tokens

Model	Input	Output	Context	Projected $/mo
DeepSeek V3	$0.32	$0.89	164K tokens (~82 books)	$4.63
Gemini 1.5 Pro (Feb 2024)	—	—	—	—

People also compared

DeepSeek V3 vs GPT-4o DeepSeek V3 vs Qwen2.5 Coder 32B Instruct DeepSeek V3 vs DeepSeek V3.2 Speciale DeepSeek V3 vs DeepSeek-V2 (MoE-236B, May 2024)DeepSeek V3 vs R1 0528 DeepSeek V3 vs DeepSeek V3 0324