Beta
Rangliste/Grok 4
xAI logo

Grok 4

von xAI · Veroeffentlicht 2025-07-09

54.8
Durchschn. Score
$3.00/1M
Eingabepreis
$15.00/1M
Ausgabepreis
256K tokens (~128 books)
Kontextfenster
multimodal
Typ

Tested on 24 benchmarks with 54.8% average. Top scores: HELM — IFEval (94.9%), Fiction.LiveBench (94.4%), HELM — MMLU-Pro (85.1%).

BenchmarkKategorieScoreBar
HELM — IFEvallanguage94.9
Fiction.LiveBenchknowledge94.4
HELM — MMLU-Proknowledge85.1
OTIS Mock AIME 2024-2025math84.0
GPQA diamondknowledge82.7
Lech Mazur Writingknowledge80.7
HELM — WildBenchreasoning79.7
Aider polyglotcoding79.6
HELM — GPQAknowledge72.6
ARC-AGIreasoning66.7
HELM — Omni-MATHmath60.3
SimpleBenchreasoning52.6
DeepResearch Benchknowledge47.9
SimpleQA Verifiedknowledge47.9
WeirdMLcoding45.7
GeoBenchknowledge45.0
Balrogknowledge43.6
Cybenchcoding43.0
Chess Puzzlesknowledge28.0
Terminal Benchcoding27.2
FrontierMath-2025-02-28-Privatemath19.7
ARC-AGI-2reasoning16.0
APEX-Agentsagentic15.2
FrontierMath-Tier-4-2025-07-01-Privatemath2.1