Better than 81% of all models
Context
131K tokens (~66 books)
Input $/1M
$3.00
Output $/1M
$15.00
Type
text
License
Proprietary
Benchmarks
6 tested
Data updated today
About
Grok 3 is the latest model from xAI. It's their flagship model that excels at enterprise use cases like data extraction, coding, and text summarization. Possesses deep domain knowledge in...
Tested on 6 benchmarks with 69.5% average. Top scores: HELM — IFEval (88.4%), HELM — WildBench (84.9%), HELM — MMLU-Pro (78.8%).
Looking for similar performance at lower cost?
Gemma 4 31B scores 68.2 (100% as good) at $0.13/1M input · 96% cheaper
Gemma 4 31B scores 68.2 (100% as good) at $0.13/1M input · 96% cheaper
Capabilities
coding
53.3
#57 globally
reasoning
84.9
#3 globally
math
46.4
#87 globally
knowledge
71.9
#14 globally
language
88.4
#19 globally
Benchmark Scores
Compare AllTested on 6 benchmarks · Ranked across 5 categories
Score Distribution (all 233 models)
0255075100
▲ You are here
codingCompare coding →
Aider polyglot
53.3—Multi-language code editing from Aider. Tests editing ability across Python, JavaScript, TypeScript, Java, C++, Go, Rust, and more.
reasoningCompare reasoning →
HELM — WildBench
84.9—Stanford HELM WildBench evaluation. Tests reasoning on challenging real-world tasks.
mathCompare math →
HELM — Omni-MATH
46.4—Stanford HELM evaluation of mathematical reasoning across diverse problem types.
Excellent (85+) Good (70-85) Average (50-70) Below (<50)
Links
Research
Documentation
Community
BenchGecko API
grok-3-beta
Specifications
- Typetext
- Context131K tokens (~66 books)
- ReleasedApr 2025
- LicenseProprietary
- Statuspreview
- Cost / Message~$0.021
Available On
Learn More
Share & Export
Frequently Asked Questions
Grok 3 Beta is a proprietary text AI model by xAI, released in April 2025. It has an average benchmark score of 67.9. Context window: 131K tokens.