Compare · ModelsLive · 2 picked · head to head
MiniMax M2.5 vs GPT-5 Mini
Side by side · benchmarks, pricing, and signals you can act on.
Winner summary
MiniMax M2.5 wins on 7/11 benchmarks
MiniMax M2.5 wins 7 of 11 shared benchmarks. Leads in reasoning · coding · math.
Category leads
reasoning·MiniMax M2.5coding·MiniMax M2.5language·GPT-5 Minimath·MiniMax M2.5knowledge·GPT-5 Mini
Hype vs Reality
Attention vs performance
MiniMax M2.5
#71 by perf·no signal
GPT-5 Mini
#65 by perf·no signal
Best value
MiniMax M2.5
1.7x better value than GPT-5 Mini
MiniMax M2.5
84.8 pts/$
$0.65/M
GPT-5 Mini
49.8 pts/$
$1.13/M
Vendor risk
Mixed exposure
One or more vendors flagged
MiniMax
$4.0B·Tier 1
OpenAI
$840.0B·Tier 1
Head to head
11 benchmarks · 2 models
MiniMax M2.5GPT-5 Mini
ARC-AGI
MiniMax M2.5 leads by +9.3
ARC-AGI · the original Abstraction and Reasoning Corpus, testing whether AI can solve novel visual pattern recognition tasks without memorization.
MiniMax M2.5
63.7
GPT-5 Mini
54.3
ARC-AGI-2
MiniMax M2.5 leads by +0.4
ARC-AGI-2 · the second iteration of the Abstraction and Reasoning Corpus, testing novel pattern recognition and abstract reasoning without prior training data.
MiniMax M2.5
4.9
GPT-5 Mini
4.4
LiveBench · Agentic Coding
MiniMax M2.5 leads by +16.7
MiniMax M2.5
51.7
GPT-5 Mini
35.0
LiveBench · Coding
GPT-5 Mini leads by +5.4
MiniMax M2.5
70.7
GPT-5 Mini
76.1
LiveBench · Data Analysis
MiniMax M2.5
49.6
GPT-5 Mini
49.6
LiveBench · If
GPT-5 Mini leads by +7.0
MiniMax M2.5
57.2
GPT-5 Mini
64.2
LiveBench · Language
GPT-5 Mini leads by +14.1
MiniMax M2.5
55.1
GPT-5 Mini
69.2
LiveBench · Mathematics
MiniMax M2.5 leads by +3.0
MiniMax M2.5
77.4
GPT-5 Mini
74.4
LiveBench · Overall
GPT-5 Mini leads by +0.9
MiniMax M2.5
60.1
GPT-5 Mini
61.0
LiveBench · Reasoning
MiniMax M2.5 leads by +0.6
MiniMax M2.5
59.3
GPT-5 Mini
58.6
Terminal Bench
MiniMax M2.5 leads by +7.4
Terminal-Bench 2.0 · evaluates AI agents on real terminal-based coding tasks · writing scripts, debugging, running tests, and managing projects entirely through command-line interaction. Tests both code quality and terminal fluency. Claude Opus 4.7 scores 69.4%, demonstrating significant agentic terminal competence.
MiniMax M2.5
42.2
GPT-5 Mini
34.8
Full benchmark table
| Benchmark | MiniMax M2.5 | GPT-5 Mini |
|---|---|---|
ARC-AGI ARC-AGI · the original Abstraction and Reasoning Corpus, testing whether AI can solve novel visual pattern recognition tasks without memorization. | 63.7 | 54.3 |
ARC-AGI-2 ARC-AGI-2 · the second iteration of the Abstraction and Reasoning Corpus, testing novel pattern recognition and abstract reasoning without prior training data. | 4.9 | 4.4 |
LiveBench · Agentic Coding | 51.7 | 35.0 |
LiveBench · Coding | 70.7 | 76.1 |
LiveBench · Data Analysis | 49.6 | 49.6 |
LiveBench · If | 57.2 | 64.2 |
LiveBench · Language | 55.1 | 69.2 |
LiveBench · Mathematics | 77.4 | 74.4 |
LiveBench · Overall | 60.1 | 61.0 |
LiveBench · Reasoning | 59.3 | 58.6 |
Terminal Bench Terminal-Bench 2.0 · evaluates AI agents on real terminal-based coding tasks · writing scripts, debugging, running tests, and managing projects entirely through command-line interaction. Tests both code quality and terminal fluency. Claude Opus 4.7 scores 69.4%, demonstrating significant agentic terminal competence. | 42.2 | 34.8 |
Pricing · per 1M tokens · projected $/mo at 10M tokens
| Model | Input | Output | Context | Projected $/mo |
|---|---|---|---|---|
| $0.15 | $1.15 | 197K tokens (~98 books) | $4.00 | |
| $0.25 | $2.00 | 400K tokens (~200 books) | $6.88 |