LIVE268개 제공업체의 976개 AI 모델 추적 중.

BenchGecko베타

모델976·제공업체268·벤치마크128·기업71·에이전트165·1위Qwen3 VL 235B A22B Instruct · 1415.8%·업데이트1시간 전·데이터 포인트2,902·MCP 서버4,923

리더보드/Grok 4

Grok 4

제공 xAI · 출시일 2025-07-09

54.8

평균 점수

$3.00/1M

입력 가격

$15.00/1M

출력 가격

256K tokens (~128 books)

컨텍스트 윈도우

multimodal

유형

Tested on 24 benchmarks with 54.8% average. Top scores: HELM — IFEval (94.9%), Fiction.LiveBench (94.4%), HELM — MMLU-Pro (85.1%).

벤치마크 점수

벤치마크	카테고리	점수	Bar
HELM — IFEval	language	94.9
Fiction.LiveBench	knowledge	94.4
HELM — MMLU-Pro	knowledge	85.1
OTIS Mock AIME 2024-2025	math	84.0
GPQA diamond	knowledge	82.7
Lech Mazur Writing	knowledge	80.7
HELM — WildBench	reasoning	79.7
Aider polyglot	coding	79.6
HELM — GPQA	knowledge	72.6
ARC-AGI	reasoning	66.7
HELM — Omni-MATH	math	60.3
SimpleBench	reasoning	52.6
DeepResearch Bench	knowledge	47.9
SimpleQA Verified	knowledge	47.9
WeirdML	coding	45.7
GeoBench	knowledge	45.0
Balrog	knowledge	43.6
Cybench	coding	43.0
Chess Puzzles	knowledge	28.0
Terminal Bench	coding	27.2
FrontierMath-2025-02-28-Private	math	19.7
ARC-AGI-2	reasoning	16.0
APEX-Agents	agentic	15.2
FrontierMath-Tier-4-2025-07-01-Private	math	2.1

유사 모델

GPT-4 (older v0314)

DeepSeek V3 0324

Qwen3 Next 80B A3B Instruct

xAI Grok 4 타임라인

$3.00/M in256Kctx24 benchmarks

Grok 4 FastSep 2025

$0.20/M in(-2.80)2.0Mctx(+1.7M)6 benchmarks