Beta
Compare · ModelsLive · 2 picked · head to head

o3 Mini vs gpt-oss-120b

Side by side · benchmarks, pricing, and signals you can act on.

Winner summary

gpt-oss-120b wins 5 of 9 shared benchmarks. Leads in arena · knowledge · math.

Category leads
coding·o3 Miniarena·gpt-oss-120bknowledge·gpt-oss-120bmath·gpt-oss-120breasoning·o3 Mini
Hype vs Reality
o3 Mini
#149 by perf·no signal
QUIET
gpt-oss-120b
#106 by perf·no signal
QUIET
Best value
29.3x better value than o3 Mini
o3 Mini
14.0 pts/$
$2.75/M
gpt-oss-120b
409.6 pts/$
$0.11/M
Vendor risk
OpenAI logo
OpenAI
$840.0B·Tier 1
Medium risk
OpenAI logo
OpenAI
$840.0B·Tier 1
Medium risk
Head to head
o3 Minigpt-oss-120b
Aider polyglot
o3 Mini leads by +18.6
Aider Polyglot · measures how well AI models can edit code across multiple programming languages using the Aider coding assistant framework.
o3 Mini
60.4
gpt-oss-120b
41.8
Chatbot Arena Elo · Overall
gpt-oss-120b leads by +6.4
o3 Mini
1347.5
gpt-oss-120b
1353.8
Chess Puzzles
gpt-oss-120b leads by +3.0
Chess Puzzles · tests strategic and tactical reasoning by having models solve chess puzzle positions, evaluating lookahead and pattern recognition abilities.
o3 Mini
17.0
gpt-oss-120b
20.0
Fiction.LiveBench
o3 Mini leads by +5.6
Fiction.LiveBench · a continuously updated benchmark using recently published fiction to test reading comprehension and reasoning, preventing data contamination.
o3 Mini
50.0
gpt-oss-120b
44.4
GPQA diamond
o3 Mini leads by +1.7
Graduate-Level Google-Proof QA (Diamond set) · expert-crafted questions in physics, biology, and chemistry that are difficult even for domain PhDs.
o3 Mini
69.4
gpt-oss-120b
67.7
Lech Mazur Writing
gpt-oss-120b leads by +15.6
Lech Mazur Writing · evaluates creative writing ability, assessing prose quality, narrative coherence, and stylistic sophistication.
o3 Mini
61.7
gpt-oss-120b
77.3
OTIS Mock AIME 2024-2025
gpt-oss-120b leads by +12.0
OTIS Mock AIME 2024–2025 · simulated American Invitational Mathematics Examination problems testing advanced problem-solving skills.
o3 Mini
76.9
gpt-oss-120b
88.9
SimpleBench
o3 Mini leads by +0.8
SimpleBench · tests fundamental reasoning capabilities with straightforward problems designed to expose gaps in basic logical and spatial thinking.
o3 Mini
7.4
gpt-oss-120b
6.5
WeirdML
gpt-oss-120b leads by +4.5
WeirdML · tests models on unusual and adversarial machine learning tasks that require creative problem-solving beyond standard patterns.
o3 Mini
43.7
gpt-oss-120b
48.2
Full benchmark table
Benchmarko3 Minigpt-oss-120b
Aider polyglot
Aider Polyglot · measures how well AI models can edit code across multiple programming languages using the Aider coding assistant framework.
60.441.8
Chatbot Arena Elo · Overall
1347.51353.8
Chess Puzzles
Chess Puzzles · tests strategic and tactical reasoning by having models solve chess puzzle positions, evaluating lookahead and pattern recognition abilities.
17.020.0
Fiction.LiveBench
Fiction.LiveBench · a continuously updated benchmark using recently published fiction to test reading comprehension and reasoning, preventing data contamination.
50.044.4
GPQA diamond
Graduate-Level Google-Proof QA (Diamond set) · expert-crafted questions in physics, biology, and chemistry that are difficult even for domain PhDs.
69.467.7
Lech Mazur Writing
Lech Mazur Writing · evaluates creative writing ability, assessing prose quality, narrative coherence, and stylistic sophistication.
61.777.3
OTIS Mock AIME 2024-2025
OTIS Mock AIME 2024–2025 · simulated American Invitational Mathematics Examination problems testing advanced problem-solving skills.
76.988.9
SimpleBench
SimpleBench · tests fundamental reasoning capabilities with straightforward problems designed to expose gaps in basic logical and spatial thinking.
7.46.5
WeirdML
WeirdML · tests models on unusual and adversarial machine learning tasks that require creative problem-solving beyond standard patterns.
43.748.2
Pricing · per 1M tokens · projected $/mo at 10M tokens
ModelInputOutputContextProjected $/mo
OpenAI logoo3 Mini$1.10$4.40200K tokens (~100 books)$19.25
OpenAI logogpt-oss-120b$0.04$0.19131K tokens (~66 books)$0.77