Compare · ModelsLive · 2 picked · head to head
Qwen3.5 397B A17B vs Kimi K2.5
Side by side · benchmarks, pricing, and signals you can act on.
Winner summary
Qwen3.5 397B A17B wins on 5/9 benchmarks
Qwen3.5 397B A17B wins 5 of 9 shared benchmarks. Leads in math · knowledge · coding.
Category leads
speed·Kimi K2.5math·Qwen3.5 397B A17Bknowledge·Qwen3.5 397B A17Blanguage·Kimi K2.5coding·Qwen3.5 397B A17B
Hype vs Reality
Attention vs performance
Qwen3.5 397B A17B
#5 by perf·no signal
Kimi K2.5
#87 by perf·no signal
Best value
Qwen3.5 397B A17B
1.3x better value than Kimi K2.5
Qwen3.5 397B A17B
57.4 pts/$
$1.36/M
Kimi K2.5
42.6 pts/$
$1.22/M
Vendor risk
Who is behind the model
Alibaba (Qwen)
$293.0B·Tier 1
moonshotai
private · undisclosed
Head to head
9 benchmarks · 2 models
Qwen3.5 397B A17BKimi K2.5
Artificial Analysis · Agentic Index
Kimi K2.5 leads by +3.1
Artificial Analysis Agentic Index · a composite score measuring how well a model performs in agentic workflows · multi-step tool use, planning, error recovery, and autonomous task completion. Aggregates results from multiple agentic benchmarks including SWE-bench, tool-use tests, and planning evaluations. The canonical single-number metric for "how good is this model as an agent?"
Qwen3.5 397B A17B
55.8
Kimi K2.5
58.9
Artificial Analysis · Coding Index
Qwen3.5 397B A17B leads by +1.7
Artificial Analysis Coding Index · a composite score that aggregates performance across multiple coding benchmarks into a single index. Tracks code generation quality, debugging ability, multi-language competence, and real-world software engineering tasks. Used by Artificial Analysis to rank model coding capability in a normalized, comparable format. Useful for developers choosing between models for coding-heavy workloads.
Qwen3.5 397B A17B
41.3
Kimi K2.5
39.5
Artificial Analysis · Quality Index
Kimi K2.5 leads by +1.8
Qwen3.5 397B A17B
45.0
Kimi K2.5
46.8
OpenCompass · AIME2025
Qwen3.5 397B A17B leads by +0.4
Qwen3.5 397B A17B
92.3
Kimi K2.5
91.9
OpenCompass · GPQA-Diamond
Qwen3.5 397B A17B leads by +0.3
Qwen3.5 397B A17B
88.4
Kimi K2.5
88.1
OpenCompass · HLE
Kimi K2.5 leads by +1.1
Qwen3.5 397B A17B
27.5
Kimi K2.5
28.6
OpenCompass · IFEval
Kimi K2.5 leads by +2.4
Qwen3.5 397B A17B
91.5
Kimi K2.5
93.9
OpenCompass · LiveCodeBenchV6
Qwen3.5 397B A17B leads by +2.4
Qwen3.5 397B A17B
83.0
Kimi K2.5
80.6
OpenCompass · MMLU-Pro
Qwen3.5 397B A17B leads by +1.4
Qwen3.5 397B A17B
87.6
Kimi K2.5
86.2
Full benchmark table
| Benchmark | Qwen3.5 397B A17B | Kimi K2.5 |
|---|---|---|
Artificial Analysis · Agentic Index Artificial Analysis Agentic Index · a composite score measuring how well a model performs in agentic workflows · multi-step tool use, planning, error recovery, and autonomous task completion. Aggregates results from multiple agentic benchmarks including SWE-bench, tool-use tests, and planning evaluations. The canonical single-number metric for "how good is this model as an agent?" | 55.8 | 58.9 |
Artificial Analysis · Coding Index Artificial Analysis Coding Index · a composite score that aggregates performance across multiple coding benchmarks into a single index. Tracks code generation quality, debugging ability, multi-language competence, and real-world software engineering tasks. Used by Artificial Analysis to rank model coding capability in a normalized, comparable format. Useful for developers choosing between models for coding-heavy workloads. | 41.3 | 39.5 |
Artificial Analysis · Quality Index | 45.0 | 46.8 |
OpenCompass · AIME2025 | 92.3 | 91.9 |
OpenCompass · GPQA-Diamond | 88.4 | 88.1 |
OpenCompass · HLE | 27.5 | 28.6 |
OpenCompass · IFEval | 91.5 | 93.9 |
OpenCompass · LiveCodeBenchV6 | 83.0 | 80.6 |
OpenCompass · MMLU-Pro | 87.6 | 86.2 |
Pricing · per 1M tokens · projected $/mo at 10M tokens
| Model | Input | Output | Context | Projected $/mo |
|---|---|---|---|---|
| $0.39 | $2.34 | 262K tokens (~131 books) | $8.78 | |
| $0.44 | $2.00 | 262K tokens (~131 books) | $8.30 |
People also compared