GPT-4o (2024-11-20)
제공 OpenAI · 출시일 2024-11-20
37.7
평균 점수
$2.50/1M
입력 가격
$10.00/1M
출력 가격
128K tokens (~64 books)
컨텍스트 윈도우
multimodal
유형
Tested on 28 benchmarks with 37.7% average. Top scores: ScienceQA (84.7%), HELM — WildBench (82.8%), Lech Mazur Writing (81.8%).
벤치마크 점수
| 벤치마크 | 카테고리 | 점수 | Bar |
|---|---|---|---|
| ScienceQA | knowledge | 84.7 | |
| HELM — WildBench | reasoning | 82.8 | |
| Lech Mazur Writing | knowledge | 81.8 | |
| HELM — IFEval | language | 81.7 | |
| MMLU | knowledge | 79.1 | |
| Aider — Code Editing | coding | 71.4 | |
| HELM — MMLU-Pro | knowledge | 71.3 | |
| GeoBench | knowledge | 71.0 | |
| VideoMME | multimodal | 62.5 | |
| MATH level 5 | math | 53.3 | |
| HELM — GPQA | knowledge | 52.0 | |
| Balrog | knowledge | 32.3 | |
| GPQA diamond | knowledge | 32.3 | |
| SWE-Bench verified | coding | 31.0 | |
| HELM — Omni-MATH | math | 29.3 | |
| CadEval | coding | 26.0 | |
| WeirdML | coding | 25.1 | |
| Aider polyglot | coding | 23.1 | |
| SWE-Bench Verified (Bash Only) | coding | 21.6 | |
| Cybench | coding | 12.5 | |
| VPCT | knowledge | 10.0 | |
| The Agent Company | agentic | 8.6 | |
| OTIS Mock AIME 2024-2025 | math | 6.3 | |
| ARC-AGI | reasoning | 4.5 | |
| SimpleBench | reasoning | 1.4 | |
| FrontierMath-2025-02-28-Private | math | 0.3 | |
| GSO-Bench | coding | 0.1 | |
| ARC-AGI-2 | reasoning | 0.1 |
유사 모델
Google DeepMind
37.7
Meta
37.8
Meta
38.0
Alibaba
37.4