Compare · ModelsLive · 2 picked · head to head

DeepSeek V3 vs Kimi K2 0711

Side by side · benchmarks, pricing, and signals you can act on.

Winner summary

Kimi K2 0711 wins 10 of 10 shared benchmarks. Leads in coding · knowledge · language.

Category leads
coding·Kimi K2 0711knowledge·Kimi K2 0711language·Kimi K2 0711math·Kimi K2 0711reasoning·Kimi K2 0711
Hype vs Reality
DeepSeek V3
#45 by perf·no signal
QUIET
Kimi K2 0711
#63 by perf·no signal
QUIET
Best value
2.5x better value than Kimi K2 0711
DeepSeek V3
97.5 pts/$
$0.60/M
Kimi K2 0711
39.2 pts/$
$1.43/M
Vendor risk
One or more vendors flagged
DeepSeek logo
DeepSeek
$3.4B·Tier 1
Higher risk
moonshotai logo
moonshotai
private · undisclosed
Unknown
Head to head
DeepSeek V3Kimi K2 0711
Aider polyglot
Kimi K2 0711 leads by +10.7
Aider Polyglot · measures how well AI models can edit code across multiple programming languages using the Aider coding assistant framework.
DeepSeek V3
48.4
Kimi K2 0711
59.1
Fiction.LiveBench
Kimi K2 0711 leads by +11.1
Fiction.LiveBench · a continuously updated benchmark using recently published fiction to test reading comprehension and reasoning, preventing data contamination.
DeepSeek V3
50.0
Kimi K2 0711
61.1
HELM · GPQA
Kimi K2 0711 leads by +11.4
DeepSeek V3
53.8
Kimi K2 0711
65.2
HELM · IFEval
Kimi K2 0711 leads by +1.8
DeepSeek V3
83.2
Kimi K2 0711
85.0
HELM · MMLU-Pro
Kimi K2 0711 leads by +9.6
DeepSeek V3
72.3
Kimi K2 0711
81.9
HELM · Omni-MATH
Kimi K2 0711 leads by +25.1
DeepSeek V3
40.3
Kimi K2 0711
65.4
HELM · WildBench
Kimi K2 0711 leads by +3.1
DeepSeek V3
83.1
Kimi K2 0711
86.2
Lech Mazur Writing
Kimi K2 0711 leads by +9.9
Lech Mazur Writing · evaluates creative writing ability, assessing prose quality, narrative coherence, and stylistic sophistication.
DeepSeek V3
77.0
Kimi K2 0711
86.9
SimpleBench
Kimi K2 0711 leads by +8.9
SimpleBench · tests fundamental reasoning capabilities with straightforward problems designed to expose gaps in basic logical and spatial thinking.
DeepSeek V3
2.7
Kimi K2 0711
11.6
WeirdML
Kimi K2 0711 leads by +3.3
WeirdML · tests models on unusual and adversarial machine learning tasks that require creative problem-solving beyond standard patterns.
DeepSeek V3
36.1
Kimi K2 0711
39.4
Full benchmark table
BenchmarkDeepSeek V3Kimi K2 0711
Aider polyglot
Aider Polyglot · measures how well AI models can edit code across multiple programming languages using the Aider coding assistant framework.
48.459.1
Fiction.LiveBench
Fiction.LiveBench · a continuously updated benchmark using recently published fiction to test reading comprehension and reasoning, preventing data contamination.
50.061.1
HELM · GPQA
53.865.2
HELM · IFEval
83.285.0
HELM · MMLU-Pro
72.381.9
HELM · Omni-MATH
40.365.4
HELM · WildBench
83.186.2
Lech Mazur Writing
Lech Mazur Writing · evaluates creative writing ability, assessing prose quality, narrative coherence, and stylistic sophistication.
77.086.9
SimpleBench
SimpleBench · tests fundamental reasoning capabilities with straightforward problems designed to expose gaps in basic logical and spatial thinking.
2.711.6
WeirdML
WeirdML · tests models on unusual and adversarial machine learning tasks that require creative problem-solving beyond standard patterns.
36.139.4
Pricing · per 1M tokens · projected $/mo at 10M tokens
ModelInputOutputContextProjected $/mo
DeepSeek logoDeepSeek V3$0.32$0.89164K tokens (~82 books)$4.63
moonshotai logoKimi K2 0711$0.57$2.30131K tokens (~66 books)$10.03