Better than 59% of all models
Context
131K tokens (~66 books)
Input $/1M
$0.30
Output $/1M
$0.50
Type
text
License
Proprietary
Benchmarks
7 tested
Data updated today
About
Grok 3 Mini is a lightweight, smaller thinking model. Unlike traditional models that generate answers immediately, Grok 3 Mini thinks before responding. It’s ideal for reasoning-heavy tasks that don’t demand...
Tested on 7 benchmarks with 64.8% average. Top scores: Chatbot Arena Elo — Overall (1357.4%), HELM — IFEval (95.1%), HELM — MMLU-Pro (79.9%).
Looking for similar performance at lower cost?
Phi 4 scores 54.2 (102% as good) at $0.07/1M input · 78% cheaper
Phi 4 scores 54.2 (102% as good) at $0.07/1M input · 78% cheaper
Capabilities
coding
49.3
#69 globally
reasoning
65.1
#28 globally
math
31.8
#125 globally
knowledge
73.7
#11 globally
language
95.1
#1 globally
Benchmark Scores
Compare AllTested on 7 benchmarks · Ranked across 6 categories
Score Distribution (all 233 models)
0255075100
▲ You are here
codingCompare coding →
Aider polyglot
49.3—Multi-language code editing from Aider. Tests editing ability across Python, JavaScript, TypeScript, Java, C++, Go, Rust, and more.
reasoningCompare reasoning →
HELM — WildBench
65.1—Stanford HELM WildBench evaluation. Tests reasoning on challenging real-world tasks.
mathCompare math →
HELM — Omni-MATH
31.8—Stanford HELM evaluation of mathematical reasoning across diverse problem types.
Excellent (85+) Good (70-85) Average (50-70) Below (<50)
Links
Research
Documentation
Community
BenchGecko API
grok-3-mini-beta
Specifications
- Typetext
- Context131K tokens (~66 books)
- ReleasedApr 2025
- LicenseProprietary
- Statuspreview
- Cost / Message~$0.001
Available On
Learn More
Share & Export
Frequently Asked Questions
Grok 3 Mini Beta is a proprietary text AI model by xAI, released in April 2025. It has an average benchmark score of 53.1. Context window: 131K tokens.