Compare · ModelsLive · 2 picked · head to head
DeepSeek V3.2 vs gpt-oss-120b (free)
Side by side · benchmarks, pricing, and signals you can act on.
Winner summary
DeepSeek V3.2 wins on 7/10 benchmarks
DeepSeek V3.2 wins 7 of 10 shared benchmarks. Leads in speed · coding · knowledge.
Category leads
speed·DeepSeek V3.2coding·DeepSeek V3.2math·gpt-oss-120b (free)knowledge·DeepSeek V3.2language·gpt-oss-120b (free)
Hype vs Reality
Attention vs performance
DeepSeek V3.2
#82 by perf·no signal
gpt-oss-120b (free)
#20 by perf·no signal
Vendor risk
Mixed exposure
One or more vendors flagged
DeepSeek
$3.4B·Tier 1
OpenAI
$840.0B·Tier 1
Head to head
10 benchmarks · 2 models
DeepSeek V3.2gpt-oss-120b (free)
Artificial Analysis · Agentic Index
DeepSeek V3.2 leads by +15.1
DeepSeek V3.2
52.9
gpt-oss-120b (free)
37.9
Artificial Analysis · Coding Index
DeepSeek V3.2 leads by +8.1
DeepSeek V3.2
36.7
gpt-oss-120b (free)
28.6
Artificial Analysis · Quality Index
DeepSeek V3.2 leads by +8.4
DeepSeek V3.2
41.7
gpt-oss-120b (free)
33.3
Aider polyglot
DeepSeek V3.2 leads by +32.4
Aider Polyglot · measures how well AI models can edit code across multiple programming languages using the Aider coding assistant framework.
DeepSeek V3.2
74.2
gpt-oss-120b (free)
41.8
OpenCompass · AIME2025
gpt-oss-120b (free) leads by +0.4
DeepSeek V3.2
93.0
gpt-oss-120b (free)
93.4
OpenCompass · GPQA-Diamond
DeepSeek V3.2 leads by +5.7
DeepSeek V3.2
84.6
gpt-oss-120b (free)
78.9
OpenCompass · HLE
DeepSeek V3.2 leads by +4.9
DeepSeek V3.2
23.2
gpt-oss-120b (free)
18.3
OpenCompass · IFEval
gpt-oss-120b (free) leads by +0.5
DeepSeek V3.2
89.7
gpt-oss-120b (free)
90.2
OpenCompass · LiveCodeBenchV6
gpt-oss-120b (free) leads by +3.0
DeepSeek V3.2
75.4
gpt-oss-120b (free)
78.4
OpenCompass · MMLU-Pro
DeepSeek V3.2 leads by +6.1
DeepSeek V3.2
85.8
gpt-oss-120b (free)
79.7
Full benchmark table
| Benchmark | DeepSeek V3.2 | gpt-oss-120b (free) |
|---|---|---|
Artificial Analysis · Agentic Index | 52.9 | 37.9 |
Artificial Analysis · Coding Index | 36.7 | 28.6 |
Artificial Analysis · Quality Index | 41.7 | 33.3 |
Aider polyglot Aider Polyglot · measures how well AI models can edit code across multiple programming languages using the Aider coding assistant framework. | 74.2 | 41.8 |
OpenCompass · AIME2025 | 93.0 | 93.4 |
OpenCompass · GPQA-Diamond | 84.6 | 78.9 |
OpenCompass · HLE | 23.2 | 18.3 |
OpenCompass · IFEval | 89.7 | 90.2 |
OpenCompass · LiveCodeBenchV6 | 75.4 | 78.4 |
OpenCompass · MMLU-Pro | 85.8 | 79.7 |
Pricing · per 1M tokens · projected $/mo at 10M tokens
| Model | Input | Output | Context | Projected $/mo |
|---|---|---|---|---|
| $0.26 | $0.38 | 164K tokens (~82 books) | $2.90 | |
| $0.00 | $0.00 | 131K tokens (~66 books) | — |
People also compared
GPT-5 Chat vs gpt-oss-120b (free)Claude Mythos Preview vs gpt-oss-120b (free)gpt-oss-120b (free) vs Qwen3.5 397B A17BDeepSeek V3.2 Speciale vs gpt-oss-120b (free)Claude Instant vs gpt-oss-120b (free)gpt-oss-120b (free) vs Step 3.5 FlashDeepSeek-V2 (MoE-236B, May 2024) vs gpt-oss-120b (free)gpt-oss-120b (free) vs MiMo-V2-Flash