How much does Grok-2 (Dec 2024) cost?

Grok-2 (Dec 2024) pricing information is not yet available.

What benchmarks has Grok-2 (Dec 2024) been tested on?

Grok-2 (Dec 2024) has been evaluated on 8 benchmarks. Top scores: Lech Mazur Writing: 63.6, MATH level 5: 63.5, Aider — Code Editing: 58.6.

Is Grok-2 (Dec 2024) open source?

No, Grok-2 (Dec 2024) is a proprietary model by xAI.

How does Grok-2 (Dec 2024) compare to Qwen2.5 1.5B Instruct?

Grok-2 (Dec 2024) has an average score of 27.6 while Qwen2.5 1.5B Instruct scores 27.3. Grok-2 (Dec 2024) outperforms Qwen2.5 1.5B Instruct overall. See full comparison →

Home/Models/Grok-2 (Dec 2024)

Grok-2 (Dec 2024)

Name: Grok-2 (Dec 2024)
Author: xAI

by xAI · Released Jan 2024

27.6

avg score

Rank #186

Compare

Better than 20% of all models

Context

N/A

Input $/1M

TBD

Output $/1M

TBD

Type

text

License

Proprietary

Benchmarks

8 tested

Data updated today

About

Tested on 8 benchmarks with 33.2% average. Top scores: Lech Mazur Writing (63.6%), MATH level 5 (63.5%), Aider — Code Editing (58.6%).

Capabilities

coding

40.4

#95 globally

reasoning

7.2

#152 globally

math

25.2

#144 globally

knowledge

51.0

#95 globally

Benchmark Scores

Compare All

Tested on 8 benchmarks · Ranked across 4 categories

Score Distribution (all 233 models)

0255075100

▲ You are here

codingCompare coding →

Aider — Code Editing

Code editing benchmark from the Aider project. Measures ability to apply targeted code changes while maintaining correctness and style.

58.6—

WeirdML

Unusual and adversarial machine learning challenges. Tests robustness of reasoning about edge cases in ML systems.

22.2—

reasoningCompare reasoning →

SimpleBench

Deceptively simple questions that humans find easy but AI models often get wrong. Tests common sense and reasoning gaps.

7.2—

mathCompare math →

MATH level 5

Competition-level math from AMC, AIME, and olympiad problems. Level 5 is the hardest tier, requiring creative problem-solving.

63.5—

OTIS Mock AIME 2024-2025

Mock AIME (American Invitational Mathematics Exam) problems from OTIS. Tests mathematical competition performance.

11.4—

FrontierMath-2025-02-28-Private

Original research-level math problems created by professional mathematicians. Problems are unpublished and cannot be memorized.

0.7—

Quick compare:

vs Qwen2.5 1.5B Instruct

vs GLM 4.6V

vs Mistral Large

Excellent (85+) Good (70-85) Average (50-70) Below (<50)

Similar Models

Qwen2.5 1.5B Instruct

Links

Info

xAI Pricing explorer Developers · API

Research

Technical Report

Documentation

API Docs Playground

Community

@xAI

BenchGecko API

grok-2-dec-2024

Specifications

Typetext
ContextN/A
ReleasedJan 2024
LicenseProprietary
Statusbenchmark-only

Available On

xAITBD

Frequently Asked Questions

Grok-2 (Dec 2024) is a proprietary text AI model by xAI, released in January 2024. It has an average benchmark score of 27.6.

Benchmarks

Lech Mazur Writing MATH level 5 Aider — Code Editing GPQA diamond WeirdML

xAI · Provider xAI · Economy All Models Compare Models Pricing Developers · API

Grok-2 (Dec 2024)

Frequently Asked Questions

Related Models

Benchmarks

Related Pages