测试版
排行榜/gpt-oss-120b
OpenAI logo

gpt-oss-120b

开源

来自 OpenAI · 发布于 2025-08-05

46.9
平均分
$0.04/1M
输入价格
$0.19/1M
输出价格
131K tokens (~66 books)
上下文窗口
text
类型

Tested on 27 benchmarks with 46.9% average. Top scores: Chatbot Arena Elo — Overall (1353.8%), OTIS Mock AIME 2024-2025 (88.9%), HELM — WildBench (84.5%).

基准测试类别分数Bar
Chatbot Arena Elo — Overallarena1353.8
OTIS Mock AIME 2024-2025math88.9
HELM — WildBenchreasoning84.5
HELM — IFEvallanguage83.6
HELM — MMLU-Proknowledge79.5
Lech Mazur Writingknowledge77.3
LiveBench — Mathematicsmath68.9
HELM — Omni-MATHmath68.8
HELM — GPQAknowledge68.4
GPQA diamondknowledge67.7
LiveBench — Codingcoding60.2
LiveBench — Iflanguage50.3
LiveBench — Languagelanguage48.6
WeirdMLcoding48.2
LiveBench — Overallknowledge46.1
Fiction.LiveBenchknowledge44.4
Aider polyglotcoding41.8
LiveBench — Reasoningreasoning39.2
LiveBench — Data Analysisreasoning38.8
SWE-Bench Verified (Bash Only)coding26.0
Chess Puzzlesknowledge20.0
Terminal Benchcoding18.7
LiveBench — Agentic Codingcoding16.7
SimpleQA Verifiedknowledge13.9
Fortresssafety8.2
SimpleBenchreasoning6.5
APEX-Agentsagentic4.7