Compare · ModelsLive · 2 picked · head to head

Qwen3.5 397B A17B vs Step 3.5 Flash

Side by side · benchmarks, pricing, and signals you can act on.

CiteAdd another

Winner summary

Qwen3.5 397B A17B wins on 7/10 benchmarks

Qwen3.5 397B A17B wins 7 of 10 shared benchmarks. Leads in speed · arena · knowledge.

Category leads

speed·Qwen3.5 397B A17Barena·Qwen3.5 397B A17Bmath·Step 3.5 Flashknowledge·Qwen3.5 397B A17Blanguage·Step 3.5 Flashcoding·Step 3.5 Flash

Hype vs Reality

Attention vs performance

Qwen3.5 397B A17B

#5 by perf·no signal

QUIET

Step 3.5 Flash

#9 by perf·#11 by attention

DESERVED

See full mindshare →

Best value

Step 3.5 Flash

6.7x better value than Qwen3.5 397B A17B

Qwen3.5 397B A17B

57.4 pts/$

$1.36/M

Step 3.5 Flash

384.5 pts/$

$0.20/M

Explore pricing →

Vendor risk

Mixed exposure

One or more vendors flagged

Alibaba (Qwen)

$293.0B·Tier 1

Low risk

StepFun

$5.0B·Tier 1

Higher risk

See the AI economy →

Head to head

10 benchmarks · 2 models

Qwen3.5 397B A17BStep 3.5 Flash

Artificial Analysis · Agentic Index

Qwen3.5 397B A17B leads by +3.8

Artificial Analysis Agentic Index · a composite score measuring how well a model performs in agentic workflows · multi-step tool use, planning, error recovery, and autonomous task completion. Aggregates results from multiple agentic benchmarks including SWE-bench, tool-use tests, and planning evaluations. The canonical single-number metric for "how good is this model as an agent?"

Qwen3.5 397B A17B

55.8

Step 3.5 Flash

52.0

Artificial Analysis · Coding Index

Qwen3.5 397B A17B leads by +9.6

Artificial Analysis Coding Index · a composite score that aggregates performance across multiple coding benchmarks into a single index. Tracks code generation quality, debugging ability, multi-language competence, and real-world software engineering tasks. Used by Artificial Analysis to rank model coding capability in a normalized, comparable format. Useful for developers choosing between models for coding-heavy workloads.

Qwen3.5 397B A17B

41.3

Step 3.5 Flash

31.6

Artificial Analysis · Quality Index

Qwen3.5 397B A17B leads by +7.3

Qwen3.5 397B A17B

45.0

Step 3.5 Flash

37.8

Chatbot Arena Elo · Overall

Qwen3.5 397B A17B leads by +56.3

Qwen3.5 397B A17B

1447.7

Step 3.5 Flash

1391.4

OpenCompass · AIME2025

Step 3.5 Flash leads by +3.4

Qwen3.5 397B A17B

92.3

Step 3.5 Flash

95.7

OpenCompass · GPQA-Diamond

Qwen3.5 397B A17B leads by +4.7

Qwen3.5 397B A17B

88.4

Step 3.5 Flash

83.7

OpenCompass · HLE

Qwen3.5 397B A17B leads by +5.9

Qwen3.5 397B A17B

27.5

Step 3.5 Flash

21.6

OpenCompass · IFEval

Step 3.5 Flash leads by +1.7

Qwen3.5 397B A17B

91.5

Step 3.5 Flash

93.2

OpenCompass · LiveCodeBenchV6

Step 3.5 Flash leads by +0.9

Qwen3.5 397B A17B

83.0

Step 3.5 Flash

83.9

OpenCompass · MMLU-Pro

Qwen3.5 397B A17B leads by +4.1

Qwen3.5 397B A17B

87.6

Step 3.5 Flash

83.5

Full benchmark table

Benchmark	Qwen3.5 397B A17B	Step 3.5 Flash
Artificial Analysis · Agentic Index Artificial Analysis Agentic Index · a composite score measuring how well a model performs in agentic workflows · multi-step tool use, planning, error recovery, and autonomous task completion. Aggregates results from multiple agentic benchmarks including SWE-bench, tool-use tests, and planning evaluations. The canonical single-number metric for "how good is this model as an agent?"	55.8	52.0
Artificial Analysis · Coding Index Artificial Analysis Coding Index · a composite score that aggregates performance across multiple coding benchmarks into a single index. Tracks code generation quality, debugging ability, multi-language competence, and real-world software engineering tasks. Used by Artificial Analysis to rank model coding capability in a normalized, comparable format. Useful for developers choosing between models for coding-heavy workloads.	41.3	31.6
Artificial Analysis · Quality Index	45.0	37.8
Chatbot Arena Elo · Overall	1447.7	1391.4
OpenCompass · AIME2025	92.3	95.7
OpenCompass · GPQA-Diamond	88.4	83.7
OpenCompass · HLE	27.5	21.6
OpenCompass · IFEval	91.5	93.2
OpenCompass · LiveCodeBenchV6	83.0	83.9
OpenCompass · MMLU-Pro	87.6	83.5

Pricing · per 1M tokens · projected $/mo at 10M tokens

Model	Input	Output	Context	Projected $/mo
Qwen3.5 397B A17B	$0.39	$2.34	262K tokens (~131 books)	$8.78
Step 3.5 Flash	$0.10	$0.30	262K tokens (~131 books)	$1.50

People also compared

GPT-5.5 Pro vs Qwen3.5 397B A17B GPT-5.5 Pro vs Step 3.5 Flash GPT-5.5 vs Qwen3.5 397B A17B GPT-5.5 vs Step 3.5 Flash GPT-5 Chat vs Qwen3.5 397B A17B Claude Mythos Preview vs Qwen3.5 397B A17B GPT-5 Chat vs Step 3.5 Flash Claude Mythos Preview vs Step 3.5 Flash