测试版
排行榜/GPT-5.1
OpenAI logo

GPT-5.1

来自 OpenAI · 发布于 2025-11-13

49.6
平均分
$1.25/1M
输入价格
$10.00/1M
输出价格
400K tokens (~200 books)
上下文窗口
multimodal
类型

Tested on 24 benchmarks with 49.6% average. Top scores: Chatbot Arena Elo — Overall (1438.5%), Chatbot Arena Elo — Coding (1338.8%), HELM — IFEval (93.5%).

基准测试类别分数Bar
Chatbot Arena Elo — Overallarena1438.5
Chatbot Arena Elo — Codingarena1338.8
HELM — IFEvallanguage93.5
OTIS Mock AIME 2024-2025math88.6
HELM — WildBenchreasoning86.3
GPQA diamondknowledge83.5
ARC-AGIreasoning72.8
SWE-Bench verifiedcoding68.0
SWE-Bench Verified (Bash Only)coding66.0
WeirdMLcoding60.8
HELM — MMLU-Proknowledge57.9
SimpleQA Verifiedknowledge48.9
Terminal Benchcoding47.6
HELM — Omni-MATHmath46.4
HELM — GPQAknowledge44.2
SimpleBenchreasoning43.8
VPCTknowledge38.0
Chess Puzzlesknowledge32.0
FrontierMath-2025-02-28-Privatemath31.0
HLEknowledge19.8
ARC-AGI-2reasoning17.6
APEX-Agentsagentic17.5
GSO-Benchcoding13.7
FrontierMath-Tier-4-2025-07-01-Privatemath12.5