Home/Models/Grok-3 mini
xAI logo

Grok-3 mini

by xAI · Released Jan 2024

47.7
avg score
Rank #142
Compare
Better than 48% of all models
Context
N/A
Input $/1M
TBD
Output $/1M
TBD
Type
text
License
Proprietary
Benchmarks
16 tested
Data updated today
About

Tested on 16 benchmarks with 53.3% average. Top scores: HELM — IFEval (95.1%), MATH level 5 (90.9%), HELM — MMLU-Pro (79.9%).

Capabilities
coding
45.9
#99 globally
reasoning
27.3
#112 globally
math
51.6
#85 globally
knowledge
62.8
#43 globally
language
95.1
#2 globally
Benchmark Scores
Compare All
Tested on 16 benchmarks · Ranked across 5 categories
Score Distribution (all 274 models)
0255075100
▲ You are here
Aider polyglot

Multi-language code editing from Aider. Tests editing ability across Python, JavaScript, TypeScript, Java, C++, Go, Rust, and more.

49.3
WeirdML

Unusual and adversarial machine learning challenges. Tests robustness of reasoning about edge cases in ML systems.

42.6
HELM — WildBench

Stanford HELM WildBench evaluation. Tests reasoning on challenging real-world tasks.

65.1
ARC-AGI

Abstraction and Reasoning Corpus. Tests fluid intelligence through novel visual pattern recognition puzzles. Core measure of general intelligence.

16.5
ARC-AGI-2

ARC-AGI 2, harder sequel to ARC. More complex abstract reasoning patterns that test generalization ability beyond training data.

0.4
MATH level 5

Competition-level math from AMC, AIME, and olympiad problems. Level 5 is the hardest tier, requiring creative problem-solving.

90.9
OTIS Mock AIME 2024-2025

Mock AIME (American Invitational Mathematics Exam) problems from OTIS. Tests mathematical competition performance.

77.8
HELM — Omni-MATH

Stanford HELM evaluation of mathematical reasoning across diverse problem types.

31.8
Excellent (85+) Good (70-85) Average (50-70) Below (<50)
Recently Happened
Grok 3 Mini marked as deprecated by xAI
Mar 12, 2026
Links
Documentation
Community
BenchGecko API
grok-3-mini
Specifications
  • Typetext
  • ContextN/A
  • ReleasedJan 2024
  • LicenseProprietary
  • Statusbenchmark-only
Available On
xAI logoxAITBD
Share & Export
Tweet
Grok-3 mini is a proprietary text AI model by xAI, released in January 2024. It has an average benchmark score of 47.7.