测试版
排行榜/Claude 3.7 Sonnet
Anthropic logo

Claude 3.7 Sonnet

来自 Anthropic · 发布于 2025-02-24

47.7
平均分
$3.00/1M
输入价格
$15.00/1M
输出价格
200K tokens (~100 books)
上下文窗口
multimodal
类型

Tested on 26 benchmarks with 47.7% average. Top scores: MATH level 5 (91.2%), HELM — IFEval (83.4%), Fiction.LiveBench (83.3%).

基准测试类别分数Bar
MATH level 5math91.2
HELM — IFEvallanguage83.4
Fiction.LiveBenchknowledge83.3
HELM — WildBenchreasoning81.4
Lech Mazur Writingknowledge81.1
HELM — MMLU-Proknowledge78.4
GPQA diamondknowledge73.0
GeoBenchknowledge68.0
Aider polyglotcoding64.9
SWE-Bench verifiedcoding61.0
HELM — GPQAknowledge60.8
OTIS Mock AIME 2024-2025math57.7
CadEvalcoding54.0
SWE-Bench Verified (Bash Only)coding52.8
DeepResearch Benchknowledge43.6
OSWorldagentic35.8
SimpleBenchreasoning35.7
HELM — Omni-MATHmath33.0
The Agent Companyagentic30.9
ARC-AGIreasoning28.6
Cybenchcoding20.0
VPCTknowledge8.5
FrontierMath-2025-02-28-Privatemath4.1
GSO-Benchcoding3.8
HLEknowledge3.4
ARC-AGI-2reasoning0.9