Home/Models/Claude 3.7 Sonnet
Anthropic logo

Claude 3.7 Sonnet

by Anthropic · Released Feb 2025

Multimodal
48.9
avg score
Rank #113
Compare
Better than 52% of all models
Context
200K tokens (~100 books)
Input $/1M
$3.00
Output $/1M
$15.00
Type
multimodal
License
Proprietary
Benchmarks
26 tested
Data updated today
About

Claude 3.7 Sonnet is an advanced large language model with improved reasoning, coding, and problem-solving capabilities. It introduces a hybrid reasoning approach, allowing users to choose between rapid responses and...

Tested on 26 benchmarks with 47.7% average. Top scores: MATH level 5 (91.2%), HELM — IFEval (83.4%), Fiction.LiveBench (83.3%).

Looking for similar performance at lower cost?
R1 scores 48.0 (98% as good) at $0.70/1M input · 77% cheaper
Capabilities
coding
42.7
#92 globally
reasoning
36.6
#73 globally
math
46.5
#86 globally
knowledge
55.6
#75 globally
agentic
33.3
#14 globally
language
83.4
#40 globally
Benchmark Scores
Compare All
Tested on 26 benchmarks · Ranked across 6 categories
Score Distribution (all 233 models)
0255075100
▲ You are here
Aider polyglot

Multi-language code editing from Aider. Tests editing ability across Python, JavaScript, TypeScript, Java, C++, Go, Rust, and more.

64.9
SWE-Bench verified

Real-world software engineering tasks from GitHub issues. Models must diagnose bugs and write patches that pass test suites. Human-verified subset of SWE-bench.

61.0
CadEval

Computer-aided design evaluation. Tests understanding of CAD concepts, 3D modeling, and engineering design principles.

54.0
HELM — WildBench

Stanford HELM WildBench evaluation. Tests reasoning on challenging real-world tasks.

81.4
SimpleBench

Deceptively simple questions that humans find easy but AI models often get wrong. Tests common sense and reasoning gaps.

35.7
ARC-AGI

Abstraction and Reasoning Corpus. Tests fluid intelligence through novel visual pattern recognition puzzles. Core measure of general intelligence.

28.6
MATH level 5

Competition-level math from AMC, AIME, and olympiad problems. Level 5 is the hardest tier, requiring creative problem-solving.

91.2
OTIS Mock AIME 2024-2025

Mock AIME (American Invitational Mathematics Exam) problems from OTIS. Tests mathematical competition performance.

57.7
HELM — Omni-MATH

Stanford HELM evaluation of mathematical reasoning across diverse problem types.

33.0
Excellent (85+) Good (70-85) Average (50-70) Below (<50)
Links
Documentation
Community
BenchGecko API
claude-3-7-sonnet
Specifications
  • Typemultimodal
  • Context200K tokens (~100 books)
  • ReleasedFeb 2025
  • LicenseProprietary
  • StatusActive
  • Cost / Message~$0.021
Available On
Anthropic logoAnthropic$3.00
Share & Export
Tweet
Claude 3.7 Sonnet is a proprietary multimodal AI model by Anthropic, released in February 2025. It has an average benchmark score of 48.9. Context window: 200K tokens.