Home/Models/Grok 3
xAI logo

Grok 3

by xAI · Released Jan 2024

44.8
avg score
Rank #150
Compare
Better than 45% of all models
Context
N/A
Input $/1M
TBD
Output $/1M
TBD
Type
text
License
Proprietary
Benchmarks
19 tested
Data updated today
About

Tested on 19 benchmarks with 45.5% average. Top scores: MATH level 5 (88.8%), HELM — IFEval (88.4%), HELM — WildBench (84.9%).

Capabilities
coding
45.3
#101 globally
reasoning
28.4
#109 globally
math
38.9
#129 globally
knowledge
62.6
#46 globally
agentic
2.1
#50 globally
language
88.4
#20 globally
Benchmark Scores
Compare All
Tested on 19 benchmarks · Ranked across 6 categories
Score Distribution (all 274 models)
0255075100
▲ You are here
Aider polyglot

Multi-language code editing from Aider. Tests editing ability across Python, JavaScript, TypeScript, Java, C++, Go, Rust, and more.

53.3
WeirdML

Unusual and adversarial machine learning challenges. Tests robustness of reasoning about edge cases in ML systems.

37.2
HELM — WildBench

Stanford HELM WildBench evaluation. Tests reasoning on challenging real-world tasks.

84.9
SimpleBench

Deceptively simple questions that humans find easy but AI models often get wrong. Tests common sense and reasoning gaps.

23.3
ARC-AGI

Abstraction and Reasoning Corpus. Tests fluid intelligence through novel visual pattern recognition puzzles. Core measure of general intelligence.

5.5
MATH level 5

Competition-level math from AMC, AIME, and olympiad problems. Level 5 is the hardest tier, requiring creative problem-solving.

88.8
OTIS Mock AIME 2024-2025

Mock AIME (American Invitational Mathematics Exam) problems from OTIS. Tests mathematical competition performance.

55.5
HELM — Omni-MATH

Stanford HELM evaluation of mathematical reasoning across diverse problem types.

46.4
Excellent (85+) Good (70-85) Average (50-70) Below (<50)
Links
Documentation
Community
BenchGecko API
grok-3
Specifications
  • Typetext
  • ContextN/A
  • ReleasedJan 2024
  • LicenseProprietary
  • Statusbenchmark-only
Available On
xAI logoxAITBD
Share & Export
Tweet
Grok 3 is a proprietary text AI model by xAI, released in January 2024. It has an average benchmark score of 44.8.