Grok 4 vs GPT-5.1
Lado a lado. Cada métrica. Cada benchmark.
| Tipo | Grok 4 | GPT-5.1 |
|---|---|---|
| Provider | ||
| puntuación promedio | 54.8 | 49.6 |
| Precio de entrada | $3.00 | $1.25 |
| Precio de salida | $15.00 | $10.00 |
| Ventana de contexto | 256K tokens (~128 books) | 400K tokens (~200 books) |
| Publicado el | 2025-07-09 | 2025-11-13 |
| Código abierto | Proprietary | Proprietary |
Puntuaciones de benchmark
17 benchmarks · Grok 4: 5, GPT-5.1: 12
| Benchmark | Categoría | Grok 4 | GPT-5.1 |
|---|---|---|---|
| APEX-Agents | agentic | 15.2 | 17.5 |
| ARC-AGI | reasoning | 66.7 | 72.8 |
| ARC-AGI-2 | reasoning | 16.0 | 17.6 |
| Chess Puzzles | knowledge | 28.0 | 32.0 |
| FrontierMath-2025-02-28-Private | math | 19.7 | 31.0 |
| FrontierMath-Tier-4-2025-07-01-Private | math | 2.1 | 12.5 |
| GPQA diamond | knowledge | 82.7 | 83.5 |
| HELM — GPQA | knowledge | 72.6 | 44.2 |
| HELM — IFEval | language | 94.9 | 93.5 |
| HELM — MMLU-Pro | knowledge | 85.1 | 57.9 |
| HELM — Omni-MATH | math | 60.3 | 46.4 |
| HELM — WildBench | reasoning | 79.7 | 86.3 |
| OTIS Mock AIME 2024-2025 | math | 84.0 | 88.6 |
| SimpleBench | reasoning | 52.6 | 43.8 |
| SimpleQA Verified | knowledge | 47.9 | 48.9 |
| Terminal Bench | coding | 27.2 | 47.6 |
| WeirdML | coding | 45.7 | 60.8 |