Compare · ModelsLive · 2 picked · head to head
DeepSeek V3.2 vs Qwen3 235B A22B Instruct 2507
Side by side · benchmarks, pricing, and signals you can act on.
Winner summary
DeepSeek V3.2 wins on 15/18 benchmarks
DeepSeek V3.2 wins 15 of 18 shared benchmarks. Leads in coding · reasoning · arena.
Category leads
coding·DeepSeek V3.2reasoning·DeepSeek V3.2arena·DeepSeek V3.2language·DeepSeek V3.2math·Qwen3 235B A22B Instruct 2507knowledge·DeepSeek V3.2
Hype vs Reality
Attention vs performance
DeepSeek V3.2
#82 by perf·no signal
Qwen3 235B A22B Instruct 2507
#97 by perf·no signal
Best value
Qwen3 235B A22B Instruct 2507
3.4x better value than DeepSeek V3.2
DeepSeek V3.2
165.6 pts/$
$0.32/M
Qwen3 235B A22B Instruct 2507
567.3 pts/$
$0.09/M
Vendor risk
Mixed exposure
One or more vendors flagged
DeepSeek
$3.4B·Tier 1
Alibaba (Qwen)
$293.0B·Tier 1
Head to head
18 benchmarks · 2 models
DeepSeek V3.2Qwen3 235B A22B Instruct 2507
Aider polyglot
DeepSeek V3.2 leads by +14.6
Aider Polyglot · measures how well AI models can edit code across multiple programming languages using the Aider coding assistant framework.
DeepSeek V3.2
74.2
Qwen3 235B A22B Instruct 2507
59.6
ARC-AGI
DeepSeek V3.2 leads by +46.0
ARC-AGI · the original Abstraction and Reasoning Corpus, testing whether AI can solve novel visual pattern recognition tasks without memorization.
DeepSeek V3.2
57.0
Qwen3 235B A22B Instruct 2507
11.0
ARC-AGI-2
DeepSeek V3.2 leads by +2.8
ARC-AGI-2 · the second iteration of the Abstraction and Reasoning Corpus, testing novel pattern recognition and abstract reasoning without prior training data.
DeepSeek V3.2
4.0
Qwen3 235B A22B Instruct 2507
1.3
Chatbot Arena Elo · Overall
DeepSeek V3.2 leads by +1.8
DeepSeek V3.2
1424.4
Qwen3 235B A22B Instruct 2507
1422.6
LiveBench · Agentic Coding
DeepSeek V3.2 leads by +33.3
DeepSeek V3.2
46.7
Qwen3 235B A22B Instruct 2507
13.3
LiveBench · Coding
DeepSeek V3.2 leads by +6.1
DeepSeek V3.2
75.7
Qwen3 235B A22B Instruct 2507
69.6
LiveBench · Data Analysis
DeepSeek V3.2 leads by +0.3
DeepSeek V3.2
45.0
Qwen3 235B A22B Instruct 2507
44.7
LiveBench · If
DeepSeek V3.2 leads by +1.3
DeepSeek V3.2
23.1
Qwen3 235B A22B Instruct 2507
21.7
LiveBench · Language
Qwen3 235B A22B Instruct 2507 leads by +1.8
DeepSeek V3.2
64.2
Qwen3 235B A22B Instruct 2507
66.1
LiveBench · Mathematics
Qwen3 235B A22B Instruct 2507 leads by +4.1
DeepSeek V3.2
64.0
Qwen3 235B A22B Instruct 2507
68.0
LiveBench · Overall
DeepSeek V3.2 leads by +3.0
DeepSeek V3.2
51.8
Qwen3 235B A22B Instruct 2507
48.8
LiveBench · Reasoning
Qwen3 235B A22B Instruct 2507 leads by +14.2
DeepSeek V3.2
44.3
Qwen3 235B A22B Instruct 2507
58.4
OpenCompass · AIME2025
DeepSeek V3.2 leads by +23.5
DeepSeek V3.2
93.0
Qwen3 235B A22B Instruct 2507
69.5
OpenCompass · GPQA-Diamond
DeepSeek V3.2 leads by +9.1
DeepSeek V3.2
84.6
Qwen3 235B A22B Instruct 2507
75.5
OpenCompass · HLE
DeepSeek V3.2 leads by +10.9
DeepSeek V3.2
23.2
Qwen3 235B A22B Instruct 2507
12.3
OpenCompass · IFEval
DeepSeek V3.2 leads by +1.4
DeepSeek V3.2
89.7
Qwen3 235B A22B Instruct 2507
88.3
OpenCompass · LiveCodeBenchV6
DeepSeek V3.2 leads by +32.4
DeepSeek V3.2
75.4
Qwen3 235B A22B Instruct 2507
43.0
OpenCompass · MMLU-Pro
DeepSeek V3.2 leads by +6.6
DeepSeek V3.2
85.8
Qwen3 235B A22B Instruct 2507
79.2
Full benchmark table
| Benchmark | DeepSeek V3.2 | Qwen3 235B A22B Instruct 2507 |
|---|---|---|
Aider polyglot Aider Polyglot · measures how well AI models can edit code across multiple programming languages using the Aider coding assistant framework. | 74.2 | 59.6 |
ARC-AGI ARC-AGI · the original Abstraction and Reasoning Corpus, testing whether AI can solve novel visual pattern recognition tasks without memorization. | 57.0 | 11.0 |
ARC-AGI-2 ARC-AGI-2 · the second iteration of the Abstraction and Reasoning Corpus, testing novel pattern recognition and abstract reasoning without prior training data. | 4.0 | 1.3 |
Chatbot Arena Elo · Overall | 1424.4 | 1422.6 |
LiveBench · Agentic Coding | 46.7 | 13.3 |
LiveBench · Coding | 75.7 | 69.6 |
LiveBench · Data Analysis | 45.0 | 44.7 |
LiveBench · If | 23.1 | 21.7 |
LiveBench · Language | 64.2 | 66.1 |
LiveBench · Mathematics | 64.0 | 68.0 |
LiveBench · Overall | 51.8 | 48.8 |
LiveBench · Reasoning | 44.3 | 58.4 |
OpenCompass · AIME2025 | 93.0 | 69.5 |
OpenCompass · GPQA-Diamond | 84.6 | 75.5 |
OpenCompass · HLE | 23.2 | 12.3 |
OpenCompass · IFEval | 89.7 | 88.3 |
OpenCompass · LiveCodeBenchV6 | 75.4 | 43.0 |
OpenCompass · MMLU-Pro | 85.8 | 79.2 |
Pricing · per 1M tokens · projected $/mo at 10M tokens
| Model | Input | Output | Context | Projected $/mo |
|---|---|---|---|---|
| $0.26 | $0.38 | 164K tokens (~82 books) | $2.90 | |
| $0.07 | $0.10 | 262K tokens (~131 books) | $0.78 |