Compare · ModelsLive · 2 picked · head to head
Kimi K2 Thinking vs Qwen3 Max
Side by side · benchmarks, pricing, and signals you can act on.
Winner summary
Kimi K2 Thinking wins on 3/5 benchmarks
Kimi K2 Thinking wins 3 of 5 shared benchmarks. Leads in knowledge · math.
Category leads
knowledge·Kimi K2 Thinkingmath·Kimi K2 Thinking
Hype vs Reality
Attention vs performance
Kimi K2 Thinking
#79 by perf·no signal
Qwen3 Max
#49 by perf·no signal
Best value
Kimi K2 Thinking
1.4x better value than Qwen3 Max
Kimi K2 Thinking
34.4 pts/$
$1.55/M
Qwen3 Max
24.9 pts/$
$2.34/M
Vendor risk
Who is behind the model
moonshotai
private · undisclosed
Alibaba (Qwen)
$293.0B·Tier 1
Head to head
5 benchmarks · 2 models
Kimi K2 ThinkingQwen3 Max
Chess Puzzles
Kimi K2 Thinking leads by +16.0
Chess Puzzles · tests strategic and tactical reasoning by having models solve chess puzzle positions, evaluating lookahead and pattern recognition abilities.
Kimi K2 Thinking
20.0
Qwen3 Max
4.0
GPQA diamond
Kimi K2 Thinking leads by +15.5
Graduate-Level Google-Proof QA (Diamond set) · expert-crafted questions in physics, biology, and chemistry that are difficult even for domain PhDs.
Kimi K2 Thinking
79.0
Qwen3 Max
63.5
OTIS Mock AIME 2024-2025
Kimi K2 Thinking leads by +9.7
OTIS Mock AIME 2024-2025 · simulated American Invitational Mathematics Examination problems testing advanced problem-solving skills.
Kimi K2 Thinking
83.0
Qwen3 Max
73.3
PostTrainBench
Qwen3 Max leads by +0.2
Kimi K2 Thinking
7.3
Qwen3 Max
7.4
SimpleQA Verified
Qwen3 Max leads by +35.9
SimpleQA Verified · short factual questions with verified answers, measuring factual accuracy and the tendency to hallucinate or provide incorrect information.
Kimi K2 Thinking
31.6
Qwen3 Max
67.5
Full benchmark table
| Benchmark | Kimi K2 Thinking | Qwen3 Max |
|---|---|---|
Chess Puzzles Chess Puzzles · tests strategic and tactical reasoning by having models solve chess puzzle positions, evaluating lookahead and pattern recognition abilities. | 20.0 | 4.0 |
GPQA diamond Graduate-Level Google-Proof QA (Diamond set) · expert-crafted questions in physics, biology, and chemistry that are difficult even for domain PhDs. | 79.0 | 63.5 |
OTIS Mock AIME 2024-2025 OTIS Mock AIME 2024-2025 · simulated American Invitational Mathematics Examination problems testing advanced problem-solving skills. | 83.0 | 73.3 |
PostTrainBench | 7.3 | 7.4 |
SimpleQA Verified SimpleQA Verified · short factual questions with verified answers, measuring factual accuracy and the tendency to hallucinate or provide incorrect information. | 31.6 | 67.5 |
Pricing · per 1M tokens · projected $/mo at 10M tokens
| Model | Input | Output | Context | Projected $/mo |
|---|---|---|---|---|
| $0.60 | $2.50 | 262K tokens (~131 books) | $10.75 | |
| $0.78 | $3.90 | 262K tokens (~131 books) | $15.60 |
People also compared