Compare · ModelsLive · 2 picked · head to head

MiniMax M2.5 vs Grok 4

Side by side · benchmarks, pricing, and signals you can act on.

Winner summary

Grok 4 wins 3 of 4 shared benchmarks. Leads in agentic · reasoning.

Category leads
agentic·Grok 4reasoning·Grok 4coding·MiniMax M2.5
Hype vs Reality
MiniMax M2.5
#71 by perf·no signal
QUIET
Grok 4
#73 by perf·no signal
QUIET
Best value
13.9x better value than Grok 4
MiniMax M2.5
84.8 pts/$
$0.65/M
Grok 4
6.1 pts/$
$9.00/M
Vendor risk
One or more vendors flagged
minimax logo
MiniMax
$4.0B·Tier 1
Higher risk
xAI logo
xAI
$250.0B·Tier 1
Medium risk
Head to head
MiniMax M2.5Grok 4
APEX-Agents
Grok 4 leads by +9.0
APEX-Agents · evaluates AI agents on complex, multi-step tasks requiring planning, tool use, and autonomous decision-making in realistic environments.
MiniMax M2.5
6.2
Grok 4
15.2
ARC-AGI
Grok 4 leads by +3.0
ARC-AGI · the original Abstraction and Reasoning Corpus, testing whether AI can solve novel visual pattern recognition tasks without memorization.
MiniMax M2.5
63.7
Grok 4
66.7
ARC-AGI-2
Grok 4 leads by +11.1
ARC-AGI-2 · the second iteration of the Abstraction and Reasoning Corpus, testing novel pattern recognition and abstract reasoning without prior training data.
MiniMax M2.5
4.9
Grok 4
16.0
Terminal Bench
MiniMax M2.5 leads by +15.0
Terminal-Bench 2.0 · evaluates AI agents on real terminal-based coding tasks · writing scripts, debugging, running tests, and managing projects entirely through command-line interaction. Tests both code quality and terminal fluency. Claude Opus 4.7 scores 69.4%, demonstrating significant agentic terminal competence.
MiniMax M2.5
42.2
Grok 4
27.2
Full benchmark table
BenchmarkMiniMax M2.5Grok 4
APEX-Agents
APEX-Agents · evaluates AI agents on complex, multi-step tasks requiring planning, tool use, and autonomous decision-making in realistic environments.
6.215.2
ARC-AGI
ARC-AGI · the original Abstraction and Reasoning Corpus, testing whether AI can solve novel visual pattern recognition tasks without memorization.
63.766.7
ARC-AGI-2
ARC-AGI-2 · the second iteration of the Abstraction and Reasoning Corpus, testing novel pattern recognition and abstract reasoning without prior training data.
4.916.0
Terminal Bench
Terminal-Bench 2.0 · evaluates AI agents on real terminal-based coding tasks · writing scripts, debugging, running tests, and managing projects entirely through command-line interaction. Tests both code quality and terminal fluency. Claude Opus 4.7 scores 69.4%, demonstrating significant agentic terminal competence.
42.227.2
Pricing · per 1M tokens · projected $/mo at 10M tokens
ModelInputOutputContextProjected $/mo
minimax logoMiniMax M2.5$0.15$1.15197K tokens (~98 books)$4.00
xAI logoGrok 4$3.00$15.00256K tokens (~128 books)$60.00