How much does Grok 3 Mini Beta cost?

Grok 3 Mini Beta costs $0.30 per million input tokens and $0.50 per million output tokens. For a typical conversation (~2,000 tokens), that's approximately $0.001 per message.

What benchmarks has Grok 3 Mini Beta been tested on?

Grok 3 Mini Beta has been evaluated on 7 benchmarks. Top scores: Chatbot Arena Elo — Overall: 1357.4, HELM — IFEval: 95.1, HELM — MMLU-Pro: 79.9.

Is Grok 3 Mini Beta open source?

No, Grok 3 Mini Beta is a proprietary model by xAI.

How does Grok 3 Mini Beta compare to Llama 3.1 70B Instruct?

Grok 3 Mini Beta has an average score of 53.1 while Llama 3.1 70B Instruct scores 53.1. Llama 3.1 70B Instruct slightly outperforms Grok 3 Mini Beta overall. Grok 3 Mini Beta costs $0.30/1M input vs Llama 3.1 70B Instruct at $0.40/1M input. See full comparison →

Home/Models/Grok 3 Mini Beta

Grok 3 Mini Beta

by xAI · Released Apr 2025

Preview

53.1

avg score

Rank #120

Compare

Better than 56% of all models

Context

131K tokens (~66 books)

Input $/1M

$0.30

Output $/1M

$0.50

Type

text

License

Proprietary

Benchmarks

7 tested

Data updated today

About

Grok 3 Mini is a lightweight, smaller thinking model. Unlike traditional models that generate answers immediately, Grok 3 Mini thinks before responding. It’s ideal for reasoning-heavy tasks that don’t demand...

Tested on 7 benchmarks with 64.8% average. Top scores: Chatbot Arena Elo — Overall (1357.4%), HELM — IFEval (95.1%), HELM — MMLU-Pro (79.9%).

Looking for similar performance at lower cost?
Qwen3.6 Flash scores 52.4 (99% as good) at $0.19/1M input · 38% cheaper

Capabilities

coding

49.3

#88 globally

reasoning

65.1

#40 globally

math

31.8

#151 globally

knowledge

73.7

#11 globally

language

95.1

#1 globally

Benchmark Scores

Compare All

Tested on 7 benchmarks · Ranked across 6 categories

Score Distribution (all 274 models)

0255075100

▲ You are here

codingCompare coding →

Aider polyglot

Multi-language code editing from Aider. Tests editing ability across Python, JavaScript, TypeScript, Java, C++, Go, Rust, and more.

49.3—

reasoningCompare reasoning →

HELM — WildBench

Stanford HELM WildBench evaluation. Tests reasoning on challenging real-world tasks.

65.1—

mathCompare math →

HELM — Omni-MATH

Stanford HELM evaluation of mathematical reasoning across diverse problem types.

31.8—

Quick compare:

vs Llama 3.1 70B Instruct

vs GLM 4.6

vs LongCat Flash Chat

Excellent (85+) Good (70-85) Average (50-70) Below (<50)

Model Family · xAI Grok 3

Grok 3Jun 2025

38.4

$3.00/M in131Kctx13 benchmarks

Grok 3 BetaApr 2025

69.5+31.1

$3.00/M in131Kctx6 benchmarks

Grok 3 MiniJun 2025

46.6-22.9

$0.30/M in(-2.70)131Kctx11 benchmarks

Grok 3 Mini BetaApr 2025

64.8+18.2

$0.30/M in131Kctx7 benchmarks

See the full Grok 3 family →

Similar Models

Llama 3.1 70B Instruct

Frequently Asked Questions

Grok 3 Mini Beta is a proprietary text AI model by xAI, released in April 2025. It has an average benchmark score of 53.1. Context window: 131K tokens.

Benchmarks

Chatbot Arena Elo — Overall HELM — IFEval HELM — MMLU-Pro HELM — GPQA HELM — WildBench

xAI · Provider xAI · Economy All Models Compare Models Pricing Developers · API Context Window · Glossary

Grok 3 Mini Beta

Frequently Asked Questions

Related Models

Benchmarks

Related Pages