Compare · ModelsLive · 2 picked · head to head

Kimi K2 0711 vs Qwen3 235B A22B

Side by side · benchmarks, pricing, and signals you can act on.

Winner summary

Qwen3 235B A22B wins 3 of 5 shared benchmarks. Leads in coding · knowledge · reasoning.

Category leads
coding·Qwen3 235B A22Bknowledge·Qwen3 235B A22Breasoning·Qwen3 235B A22B
Hype vs Reality
Kimi K2 0711
#63 by perf·no signal
QUIET
Qwen3 235B A22B
#60 by perf·no signal
QUIET
Best value
1.3x better value than Kimi K2 0711
Kimi K2 0711
39.2 pts/$
$1.43/M
Qwen3 235B A22B
49.6 pts/$
$1.14/M
Vendor risk
moonshotai logo
moonshotai
private · undisclosed
Unknown
Alibaba Qwen logo
Alibaba (Qwen)
$293.0B·Tier 1
Low risk
Head to head
Kimi K2 0711Qwen3 235B A22B
Aider polyglot
Qwen3 235B A22B leads by +0.5
Aider Polyglot · measures how well AI models can edit code across multiple programming languages using the Aider coding assistant framework.
Kimi K2 0711
59.1
Qwen3 235B A22B
59.6
Fiction.LiveBench
Qwen3 235B A22B leads by +6.6
Fiction.LiveBench · a continuously updated benchmark using recently published fiction to test reading comprehension and reasoning, preventing data contamination.
Kimi K2 0711
61.1
Qwen3 235B A22B
67.7
Lech Mazur Writing
Kimi K2 0711 leads by +3.9
Lech Mazur Writing · evaluates creative writing ability, assessing prose quality, narrative coherence, and stylistic sophistication.
Kimi K2 0711
86.9
Qwen3 235B A22B
83.0
SimpleBench
Qwen3 235B A22B leads by +5.6
SimpleBench · tests fundamental reasoning capabilities with straightforward problems designed to expose gaps in basic logical and spatial thinking.
Kimi K2 0711
11.6
Qwen3 235B A22B
17.2
WeirdML
Kimi K2 0711 leads by +2.1
WeirdML · tests models on unusual and adversarial machine learning tasks that require creative problem-solving beyond standard patterns.
Kimi K2 0711
39.4
Qwen3 235B A22B
37.3
Full benchmark table
BenchmarkKimi K2 0711Qwen3 235B A22B
Aider polyglot
Aider Polyglot · measures how well AI models can edit code across multiple programming languages using the Aider coding assistant framework.
59.159.6
Fiction.LiveBench
Fiction.LiveBench · a continuously updated benchmark using recently published fiction to test reading comprehension and reasoning, preventing data contamination.
61.167.7
Lech Mazur Writing
Lech Mazur Writing · evaluates creative writing ability, assessing prose quality, narrative coherence, and stylistic sophistication.
86.983.0
SimpleBench
SimpleBench · tests fundamental reasoning capabilities with straightforward problems designed to expose gaps in basic logical and spatial thinking.
11.617.2
WeirdML
WeirdML · tests models on unusual and adversarial machine learning tasks that require creative problem-solving beyond standard patterns.
39.437.3
Pricing · per 1M tokens · projected $/mo at 10M tokens
ModelInputOutputContextProjected $/mo
moonshotai logoKimi K2 0711$0.57$2.30131K tokens (~66 books)$10.03
Alibaba Qwen logoQwen3 235B A22B$0.46$1.82131K tokens (~66 books)$7.96