Compare · ModelsLive · 3 picked · head to head

DeepSeek V3.2 Speciale vs GLM 5 vs Step 3.5 Flash

Side by side · benchmarks, pricing, and signals you can act on.

CiteAdd another

Winner summary

DeepSeek V3.2 Speciale wins on 5/10 benchmarks

DeepSeek V3.2 Speciale wins 5 of 10 shared benchmarks. Leads in math · knowledge.

DeepSeek V3.2 Speciale

Category leads

math·DeepSeek V3.2 Specialeknowledge·DeepSeek V3.2 Specialelanguage·GLM 5coding·GLM 5speed·Step 3.5 Flasharena·GLM 5

Hype vs Reality

Attention vs performance

DeepSeek V3.2 Speciale

#6 by perf·#5 by attention

DESERVED

GLM 5

#55 by perf·#27 by attention

UNDERRATED

Step 3.5 Flash

#9 by perf·#11 by attention

DESERVED

See full mindshare →

Best value

Step 3.5 Flash

3.9x better value than DeepSeek V3.2 Speciale

DeepSeek V3.2 Speciale

97.8 pts/$

$0.80/M

GLM 5

45.7 pts/$

$1.26/M

Step 3.5 Flash

384.5 pts/$

$0.20/M

Explore pricing →

Vendor risk

Mixed exposure

One or more vendors flagged

DeepSeek

$3.4B·Tier 1

Higher risk

z-ai

private · undisclosed

Unknown

StepFun

$5.0B·Tier 1

Higher risk

See the AI economy →

Head to head

10 benchmarks · 3 models

DeepSeek V3.2 SpecialeGLM 5Step 3.5 Flash

OpenCompass · AIME2025

DeepSeek V3.2 Speciale leads by +0.2

DeepSeek V3.2 Speciale

96.0

GLM 5

95.8

Step 3.5 Flash

95.7

OpenCompass · GPQA-Diamond

DeepSeek V3.2 Speciale leads by +1.4

DeepSeek V3.2 Speciale

86.7

GLM 5

85.3

Step 3.5 Flash

83.7

OpenCompass · HLE

DeepSeek V3.2 Speciale leads by +0.5

DeepSeek V3.2 Speciale

28.6

GLM 5

28.1

Step 3.5 Flash

21.6

OpenCompass · IFEval

DeepSeek V3.2 Speciale

91.7

GLM 5

93.2

Step 3.5 Flash

93.2

OpenCompass · LiveCodeBenchV6

GLM 5 leads by +2.3

DeepSeek V3.2 Speciale

80.9

GLM 5

86.2

Step 3.5 Flash

83.9

OpenCompass · MMLU-Pro

DeepSeek V3.2 Speciale leads by +0.3

DeepSeek V3.2 Speciale

85.5

GLM 5

85.2

Step 3.5 Flash

83.5

Artificial Analysis · Agentic Index

Step 3.5 Flash leads by +52.0

Artificial Analysis Agentic Index · a composite score measuring how well a model performs in agentic workflows · multi-step tool use, planning, error recovery, and autonomous task completion. Aggregates results from multiple agentic benchmarks including SWE-bench, tool-use tests, and planning evaluations. The canonical single-number metric for "how good is this model as an agent?"

DeepSeek V3.2 Speciale

0.0

Step 3.5 Flash

52.0

Artificial Analysis · Coding Index

DeepSeek V3.2 Speciale leads by +6.3

Artificial Analysis Coding Index · a composite score that aggregates performance across multiple coding benchmarks into a single index. Tracks code generation quality, debugging ability, multi-language competence, and real-world software engineering tasks. Used by Artificial Analysis to rank model coding capability in a normalized, comparable format. Useful for developers choosing between models for coding-heavy workloads.

DeepSeek V3.2 Speciale

37.9

Step 3.5 Flash

31.6

Artificial Analysis · Quality Index

Step 3.5 Flash leads by +8.4

DeepSeek V3.2 Speciale

29.4

Step 3.5 Flash

37.8

Chatbot Arena Elo · Overall

GLM 5 leads by +64.2

GLM 5

1455.6

Step 3.5 Flash

1391.4

Full benchmark table

Benchmark	DeepSeek V3.2 Speciale	GLM 5	Step 3.5 Flash
OpenCompass · AIME2025	96.0	95.8	95.7
OpenCompass · GPQA-Diamond	86.7	85.3	83.7
OpenCompass · HLE	28.6	28.1	21.6
OpenCompass · IFEval	91.7	93.2	93.2
OpenCompass · LiveCodeBenchV6	80.9	86.2	83.9
OpenCompass · MMLU-Pro	85.5	85.2	83.5
Artificial Analysis · Agentic Index Artificial Analysis Agentic Index · a composite score measuring how well a model performs in agentic workflows · multi-step tool use, planning, error recovery, and autonomous task completion. Aggregates results from multiple agentic benchmarks including SWE-bench, tool-use tests, and planning evaluations. The canonical single-number metric for "how good is this model as an agent?"	0.0	—	52.0
Artificial Analysis · Coding Index Artificial Analysis Coding Index · a composite score that aggregates performance across multiple coding benchmarks into a single index. Tracks code generation quality, debugging ability, multi-language competence, and real-world software engineering tasks. Used by Artificial Analysis to rank model coding capability in a normalized, comparable format. Useful for developers choosing between models for coding-heavy workloads.	37.9	—	31.6
Artificial Analysis · Quality Index	29.4	—	37.8
Chatbot Arena Elo · Overall	—	1455.6	1391.4

Pricing · per 1M tokens · projected $/mo at 10M tokens

Model	Input	Output	Context	Projected $/mo
DeepSeek V3.2 Speciale	$0.40	$1.20	164K tokens (~82 books)	$6.00
GLM 5	$0.60	$1.92	203K tokens (~101 books)	$9.30
Step 3.5 Flash	$0.10	$0.30	262K tokens (~131 books)	$1.50

People also compared

DeepSeek V3.2 Speciale vs GPT-5.5 Pro GPT-5.5 Pro vs Step 3.5 Flash DeepSeek V3.2 Speciale vs GPT-5.5 GPT-5.5 vs Step 3.5 Flash DeepSeek V3.2 Speciale vs GPT-5 Chat Claude Mythos Preview vs DeepSeek V3.2 Speciale GPT-5 Chat vs Step 3.5 Flash Claude Mythos Preview vs Step 3.5 Flash