Compare · ModelsLive · 2 picked · head to head
Qwen3 235B A22B vs Kimi K2 0711
Side by side · benchmarks, pricing, and signals you can act on.
Winner summary
Qwen3 235B A22B wins on 3/5 benchmarks
Qwen3 235B A22B wins 3 of 5 shared benchmarks. Leads in coding · knowledge · reasoning.
Category leads
coding·Qwen3 235B A22Bknowledge·Qwen3 235B A22Breasoning·Qwen3 235B A22B
Hype vs Reality
Attention vs performance
Qwen3 235B A22B
#60 by perf·no signal
Kimi K2 0711
#63 by perf·no signal
Best value
Qwen3 235B A22B
1.3x better value than Kimi K2 0711
Qwen3 235B A22B
49.6 pts/$
$1.14/M
Kimi K2 0711
39.2 pts/$
$1.43/M
Vendor risk
Who is behind the model
Alibaba (Qwen)
$293.0B·Tier 1
moonshotai
private · undisclosed
Head to head
5 benchmarks · 2 models
Qwen3 235B A22BKimi K2 0711
Aider polyglot
Qwen3 235B A22B leads by +0.5
Aider Polyglot · measures how well AI models can edit code across multiple programming languages using the Aider coding assistant framework.
Qwen3 235B A22B
59.6
Kimi K2 0711
59.1
Fiction.LiveBench
Qwen3 235B A22B leads by +6.6
Fiction.LiveBench · a continuously updated benchmark using recently published fiction to test reading comprehension and reasoning, preventing data contamination.
Qwen3 235B A22B
67.7
Kimi K2 0711
61.1
Lech Mazur Writing
Kimi K2 0711 leads by +3.9
Lech Mazur Writing · evaluates creative writing ability, assessing prose quality, narrative coherence, and stylistic sophistication.
Qwen3 235B A22B
83.0
Kimi K2 0711
86.9
SimpleBench
Qwen3 235B A22B leads by +5.6
SimpleBench · tests fundamental reasoning capabilities with straightforward problems designed to expose gaps in basic logical and spatial thinking.
Qwen3 235B A22B
17.2
Kimi K2 0711
11.6
WeirdML
Kimi K2 0711 leads by +2.1
WeirdML · tests models on unusual and adversarial machine learning tasks that require creative problem-solving beyond standard patterns.
Qwen3 235B A22B
37.3
Kimi K2 0711
39.4
Full benchmark table
| Benchmark | Qwen3 235B A22B | Kimi K2 0711 |
|---|---|---|
Aider polyglot Aider Polyglot · measures how well AI models can edit code across multiple programming languages using the Aider coding assistant framework. | 59.6 | 59.1 |
Fiction.LiveBench Fiction.LiveBench · a continuously updated benchmark using recently published fiction to test reading comprehension and reasoning, preventing data contamination. | 67.7 | 61.1 |
Lech Mazur Writing Lech Mazur Writing · evaluates creative writing ability, assessing prose quality, narrative coherence, and stylistic sophistication. | 83.0 | 86.9 |
SimpleBench SimpleBench · tests fundamental reasoning capabilities with straightforward problems designed to expose gaps in basic logical and spatial thinking. | 17.2 | 11.6 |
WeirdML WeirdML · tests models on unusual and adversarial machine learning tasks that require creative problem-solving beyond standard patterns. | 37.3 | 39.4 |
Pricing · per 1M tokens · projected $/mo at 10M tokens
| Model | Input | Output | Context | Projected $/mo |
|---|---|---|---|---|
| $0.46 | $1.82 | 131K tokens (~66 books) | $7.96 | |
| $0.57 | $2.30 | 131K tokens (~66 books) | $10.03 |