Compare · ModelsLive · 2 picked · head to head
Kimi K2 0711 vs DeepSeek V3
Side by side · benchmarks, pricing, and signals you can act on.
Winner summary
Kimi K2 0711 wins on 10/10 benchmarks
Kimi K2 0711 wins 10 of 10 shared benchmarks. Leads in coding · knowledge · language.
Category leads
coding·Kimi K2 0711knowledge·Kimi K2 0711language·Kimi K2 0711math·Kimi K2 0711reasoning·Kimi K2 0711
Hype vs Reality
Attention vs performance
Kimi K2 0711
#63 by perf·no signal
DeepSeek V3
#45 by perf·no signal
Best value
DeepSeek V3
2.5x better value than Kimi K2 0711
Kimi K2 0711
39.2 pts/$
$1.43/M
DeepSeek V3
97.5 pts/$
$0.60/M
Vendor risk
Mixed exposure
One or more vendors flagged
moonshotai
private · undisclosed
DeepSeek
$3.4B·Tier 1
Head to head
10 benchmarks · 2 models
Kimi K2 0711DeepSeek V3
Aider polyglot
Kimi K2 0711 leads by +10.7
Aider Polyglot · measures how well AI models can edit code across multiple programming languages using the Aider coding assistant framework.
Kimi K2 0711
59.1
DeepSeek V3
48.4
Fiction.LiveBench
Kimi K2 0711 leads by +11.1
Fiction.LiveBench · a continuously updated benchmark using recently published fiction to test reading comprehension and reasoning, preventing data contamination.
Kimi K2 0711
61.1
DeepSeek V3
50.0
HELM · GPQA
Kimi K2 0711 leads by +11.4
Kimi K2 0711
65.2
DeepSeek V3
53.8
HELM · IFEval
Kimi K2 0711 leads by +1.8
Kimi K2 0711
85.0
DeepSeek V3
83.2
HELM · MMLU-Pro
Kimi K2 0711 leads by +9.6
Kimi K2 0711
81.9
DeepSeek V3
72.3
HELM · Omni-MATH
Kimi K2 0711 leads by +25.1
Kimi K2 0711
65.4
DeepSeek V3
40.3
HELM · WildBench
Kimi K2 0711 leads by +3.1
Kimi K2 0711
86.2
DeepSeek V3
83.1
Lech Mazur Writing
Kimi K2 0711 leads by +9.9
Lech Mazur Writing · evaluates creative writing ability, assessing prose quality, narrative coherence, and stylistic sophistication.
Kimi K2 0711
86.9
DeepSeek V3
77.0
SimpleBench
Kimi K2 0711 leads by +8.9
SimpleBench · tests fundamental reasoning capabilities with straightforward problems designed to expose gaps in basic logical and spatial thinking.
Kimi K2 0711
11.6
DeepSeek V3
2.7
WeirdML
Kimi K2 0711 leads by +3.3
WeirdML · tests models on unusual and adversarial machine learning tasks that require creative problem-solving beyond standard patterns.
Kimi K2 0711
39.4
DeepSeek V3
36.1
Full benchmark table
| Benchmark | Kimi K2 0711 | DeepSeek V3 |
|---|---|---|
Aider polyglot Aider Polyglot · measures how well AI models can edit code across multiple programming languages using the Aider coding assistant framework. | 59.1 | 48.4 |
Fiction.LiveBench Fiction.LiveBench · a continuously updated benchmark using recently published fiction to test reading comprehension and reasoning, preventing data contamination. | 61.1 | 50.0 |
HELM · GPQA | 65.2 | 53.8 |
HELM · IFEval | 85.0 | 83.2 |
HELM · MMLU-Pro | 81.9 | 72.3 |
HELM · Omni-MATH | 65.4 | 40.3 |
HELM · WildBench | 86.2 | 83.1 |
Lech Mazur Writing Lech Mazur Writing · evaluates creative writing ability, assessing prose quality, narrative coherence, and stylistic sophistication. | 86.9 | 77.0 |
SimpleBench SimpleBench · tests fundamental reasoning capabilities with straightforward problems designed to expose gaps in basic logical and spatial thinking. | 11.6 | 2.7 |
WeirdML WeirdML · tests models on unusual and adversarial machine learning tasks that require creative problem-solving beyond standard patterns. | 39.4 | 36.1 |
Pricing · per 1M tokens · projected $/mo at 10M tokens
| Model | Input | Output | Context | Projected $/mo |
|---|---|---|---|---|
| $0.57 | $2.30 | 131K tokens (~66 books) | $10.03 | |
| $0.32 | $0.89 | 164K tokens (~82 books) | $4.63 |