Compare · ModelsLive · 2 picked · head to head

DeepSeek V3.2 vs Qwen3 235B A22B Instruct 2507

Side by side · benchmarks, pricing, and signals you can act on.

CiteAdd another

Winner summary

DeepSeek V3.2 wins on 15/18 benchmarks

DeepSeek V3.2 wins 15 of 18 shared benchmarks. Leads in coding · reasoning · arena.

DeepSeek V3.2

15 / 18

Qwen3 235B A22B Instruct 2507

3 / 18

Category leads

coding·DeepSeek V3.2reasoning·DeepSeek V3.2arena·DeepSeek V3.2language·DeepSeek V3.2math·Qwen3 235B A22B Instruct 2507knowledge·DeepSeek V3.2

Hype vs Reality

Attention vs performance

DeepSeek V3.2

#84 by perf·no signal

QUIET

Qwen3 235B A22B Instruct 2507

#99 by perf·no signal

QUIET

See full mindshare →

Best value

Qwen3 235B A22B Instruct 2507

3.4x better value than DeepSeek V3.2

DeepSeek V3.2

168.3 pts/$

$0.32/M

Qwen3 235B A22B Instruct 2507

567.3 pts/$

$0.09/M

Explore pricing →

Vendor risk

Mixed exposure

One or more vendors flagged

DeepSeek

$3.4B·Tier 1

Higher risk

Alibaba (Qwen)

$293.0B·Tier 1

Low risk

See the AI economy →

Head to head

18 benchmarks · 2 models

DeepSeek V3.2Qwen3 235B A22B Instruct 2507

Aider polyglot

DeepSeek V3.2 leads by +14.6

Aider Polyglot · measures how well AI models can edit code across multiple programming languages using the Aider coding assistant framework.

DeepSeek V3.2

74.2

Qwen3 235B A22B Instruct 2507

59.6

ARC-AGI

DeepSeek V3.2 leads by +46.0

ARC-AGI · the original Abstraction and Reasoning Corpus, testing whether AI can solve novel visual pattern recognition tasks without memorization.

DeepSeek V3.2

57.0

Qwen3 235B A22B Instruct 2507

11.0

ARC-AGI-2

DeepSeek V3.2 leads by +2.8

ARC-AGI-2 · the second iteration of the Abstraction and Reasoning Corpus, testing novel pattern recognition and abstract reasoning without prior training data.

DeepSeek V3.2

4.0

Qwen3 235B A22B Instruct 2507

1.3

Chatbot Arena Elo · Overall

DeepSeek V3.2 leads by +1.8

DeepSeek V3.2

1424.4

Qwen3 235B A22B Instruct 2507

1422.6

LiveBench · Agentic Coding

DeepSeek V3.2 leads by +33.3

DeepSeek V3.2

46.7

Qwen3 235B A22B Instruct 2507

13.3

LiveBench · Coding

DeepSeek V3.2 leads by +6.1

DeepSeek V3.2

75.7

Qwen3 235B A22B Instruct 2507

69.6

LiveBench · Data Analysis

DeepSeek V3.2 leads by +0.3

DeepSeek V3.2

45.0

Qwen3 235B A22B Instruct 2507

44.7

LiveBench · If

DeepSeek V3.2 leads by +1.3

DeepSeek V3.2

23.1

Qwen3 235B A22B Instruct 2507

21.7

LiveBench · Language

Qwen3 235B A22B Instruct 2507 leads by +1.8

DeepSeek V3.2

64.2

Qwen3 235B A22B Instruct 2507

66.1

LiveBench · Mathematics

Qwen3 235B A22B Instruct 2507 leads by +4.1

DeepSeek V3.2

64.0

Qwen3 235B A22B Instruct 2507

68.0

LiveBench · Overall

DeepSeek V3.2 leads by +3.0

DeepSeek V3.2

51.8

Qwen3 235B A22B Instruct 2507

48.8

LiveBench · Reasoning

Qwen3 235B A22B Instruct 2507 leads by +14.2

DeepSeek V3.2

44.3

Qwen3 235B A22B Instruct 2507

58.4

OpenCompass · AIME2025

DeepSeek V3.2 leads by +23.5

DeepSeek V3.2

93.0

Qwen3 235B A22B Instruct 2507

69.5

OpenCompass · GPQA-Diamond

DeepSeek V3.2 leads by +9.1

DeepSeek V3.2

84.6

Qwen3 235B A22B Instruct 2507

75.5

OpenCompass · HLE

DeepSeek V3.2 leads by +10.9

DeepSeek V3.2

23.2

Qwen3 235B A22B Instruct 2507

12.3

OpenCompass · IFEval

DeepSeek V3.2 leads by +1.4

DeepSeek V3.2

89.7

Qwen3 235B A22B Instruct 2507

88.3

OpenCompass · LiveCodeBenchV6

DeepSeek V3.2 leads by +32.4

DeepSeek V3.2

75.4

Qwen3 235B A22B Instruct 2507

43.0

OpenCompass · MMLU-Pro

DeepSeek V3.2 leads by +6.6

DeepSeek V3.2

85.8

Qwen3 235B A22B Instruct 2507

79.2

Full benchmark table

Benchmark	DeepSeek V3.2	Qwen3 235B A22B Instruct 2507
Aider polyglot Aider Polyglot · measures how well AI models can edit code across multiple programming languages using the Aider coding assistant framework.	74.2	59.6
ARC-AGI ARC-AGI · the original Abstraction and Reasoning Corpus, testing whether AI can solve novel visual pattern recognition tasks without memorization.	57.0	11.0
ARC-AGI-2 ARC-AGI-2 · the second iteration of the Abstraction and Reasoning Corpus, testing novel pattern recognition and abstract reasoning without prior training data.	4.0	1.3
Chatbot Arena Elo · Overall	1424.4	1422.6
LiveBench · Agentic Coding	46.7	13.3
LiveBench · Coding	75.7	69.6
LiveBench · Data Analysis	45.0	44.7
LiveBench · If	23.1	21.7
LiveBench · Language	64.2	66.1
LiveBench · Mathematics	64.0	68.0
LiveBench · Overall	51.8	48.8
LiveBench · Reasoning	44.3	58.4
OpenCompass · AIME2025	93.0	69.5
OpenCompass · GPQA-Diamond	84.6	75.5
OpenCompass · HLE	23.2	12.3
OpenCompass · IFEval	89.7	88.3
OpenCompass · LiveCodeBenchV6	75.4	43.0
OpenCompass · MMLU-Pro	85.8	79.2

Pricing · per 1M tokens · projected $/mo at 10M tokens

Model	Input	Output	Context	Projected $/mo
DeepSeek V3.2	$0.25	$0.38	131K tokens (~66 books)	$2.83
Qwen3 235B A22B Instruct 2507	$0.07	$0.10	262K tokens (~131 books)	$0.78