Home/Models/Claude 3.5 Sonnet
Anthropic logo

Claude 3.5 Sonnet

by Anthropic · Released Jan 2024

44.6
avg score
Rank #125
Compare
Better than 46% of all models
Context
N/A
Input $/1M
TBD
Output $/1M
TBD
Type
text
License
Proprietary
Benchmarks
25 tested
Data updated today
About

Tested on 25 benchmarks with 42.3% average. Top scores: Chatbot Arena Elo — Overall (1371.4%), HELM — IFEval (85.6%), Aider — Code Editing (84.2%).

Capabilities
coding
39.5
#97 globally
reasoning
46.1
#53 globally
math
17.4
#175 globally
knowledge
61.4
#41 globally
agentic
24.0
#20 globally
multimodal
46.7
#8 globally
safety
13.0
#5 globally
language
85.6
#30 globally
Benchmark Scores
Compare All
Tested on 25 benchmarks · Ranked across 9 categories
Score Distribution (all 233 models)
0255075100
▲ You are here
Aider — Code Editing

Code editing benchmark from the Aider project. Measures ability to apply targeted code changes while maintaining correctness and style.

84.2
Aider polyglot

Multi-language code editing from Aider. Tests editing ability across Python, JavaScript, TypeScript, Java, C++, Go, Rust, and more.

51.6
CadEval

Computer-aided design evaluation. Tests understanding of CAD concepts, 3D modeling, and engineering design principles.

48.0
HELM — WildBench

Stanford HELM WildBench evaluation. Tests reasoning on challenging real-world tasks.

79.2
SimpleBench

Deceptively simple questions that humans find easy but AI models often get wrong. Tests common sense and reasoning gaps.

13.0
MATH level 5

Competition-level math from AMC, AIME, and olympiad problems. Level 5 is the hardest tier, requiring creative problem-solving.

51.7
HELM — Omni-MATH

Stanford HELM evaluation of mathematical reasoning across diverse problem types.

27.6
OTIS Mock AIME 2024-2025

Mock AIME (American Invitational Mathematics Exam) problems from OTIS. Tests mathematical competition performance.

6.4
Excellent (85+) Good (70-85) Average (50-70) Below (<50)
Links
Documentation
Community
BenchGecko API
claude-3-5-sonnet
Specifications
  • Typetext
  • ContextN/A
  • ReleasedJan 2024
  • LicenseProprietary
  • Statusbenchmark-only
Available On
Anthropic logoAnthropicTBD
Share & Export
Tweet
Claude 3.5 Sonnet is a proprietary text AI model by Anthropic, released in January 2024. It has an average benchmark score of 44.6.