LIVETracking 994 AI models from 267 providers.

Gecko Tests·powered by GeckoBench · AI Bias, Censorship, IQ & Politics View Gecko Tests Build your own chart

Home/Models/Claude 3.5 Sonnet

Claude 3.5 Sonnet

by Anthropic · Released Jan 2024

44.6

avg score

Rank #125

Better than 46% of all models

Context

N/A

Input $/1M

TBD

Output $/1M

TBD

Type

text

License

Proprietary

Benchmarks

25 tested

Data updated today

About

Tested on 25 benchmarks with 42.3% average. Top scores: Chatbot Arena Elo — Overall (1371.4%), HELM — IFEval (85.6%), Aider — Code Editing (84.2%).

Capabilities

coding

39.5

#97 globally

reasoning

46.1

#53 globally

math

17.4

#175 globally

knowledge

61.4

#41 globally

agentic

24.0

#20 globally

multimodal

46.7

#8 globally

safety

13.0

#5 globally

language

85.6

#30 globally

Benchmark Scores

Tested on 25 benchmarks · Ranked across 9 categories

Score Distribution (all 233 models)

0255075100

▲ You are here

codingCompare coding →

Aider — Code Editing

Code editing benchmark from the Aider project. Measures ability to apply targeted code changes while maintaining correctness and style.

84.2—

Multi-language code editing from Aider. Tests editing ability across Python, JavaScript, TypeScript, Java, C++, Go, Rust, and more.

51.6—

Computer-aided design evaluation. Tests understanding of CAD concepts, 3D modeling, and engineering design principles.

48.0—

reasoningCompare reasoning →

HELM — WildBench

Stanford HELM WildBench evaluation. Tests reasoning on challenging real-world tasks.

79.2—

Deceptively simple questions that humans find easy but AI models often get wrong. Tests common sense and reasoning gaps.

13.0—

mathCompare math →

Competition-level math from AMC, AIME, and olympiad problems. Level 5 is the hardest tier, requiring creative problem-solving.

51.7—

HELM — Omni-MATH

Stanford HELM evaluation of mathematical reasoning across diverse problem types.

27.6—

OTIS Mock AIME 2024-2025

Mock AIME (American Invitational Mathematics Exam) problems from OTIS. Tests mathematical competition performance.

6.4—

Quick compare:

vs Gemini 2.0 Flash

Excellent (85+) Good (70-85) Average (50-70) Below (<50)

Model Family · Anthropic Claude Sonnet

Claude 3 SonnetJan 2024

N/AN/Actx6 benchmarks

Claude 3.5 SonnetJan 2024

N/AN/Actx25 benchmarks

Claude 3.7 SonnetFeb 2025

$3.00/M in200Kctx26 benchmarks

Claude 3.7 Sonnet (thinking)Feb 2025

$3.00/M in200Kctx

Claude Sonnet 4May 2025

$3.00/M in1.0Mctx(+800K)27 benchmarks

Claude Sonnet 4.5Sep 2025

$3.00/M in1.0Mctx21 benchmarks

Claude Sonnet 4.6Feb 2026

$3.00/M in1.0Mctx18 benchmarks

See the full Claude Sonnet family →

Similar Models

Gemini 2.0 Flash

Google DeepMind

Links

Info

Anthropic Pricing explorer Developers · API

Research

Technical Report

Documentation

API Docs Playground

Community

BenchGecko API

claude-3-5-sonnet

Specifications

Typetext
ContextN/A
ReleasedJan 2024
LicenseProprietary
Statusbenchmark-only

Available On

AnthropicTBD

Categories

coding reasoning math knowledge agentic multimodal safety language

Powers Agents

OpenHands71.8%SWE-agent66.6%View all agents →

Learn More

transformer tokens

Share & Export

Related Models

Gemini 2.0 Flash

Frequently Asked Questions

Claude 3.5 Sonnet is a proprietary text AI model by Anthropic, released in January 2024. It has an average benchmark score of 44.6.

Related Models

Gemini 2.0 Flash · Google DeepMind gpt-oss-20b · OpenAI GPT-5 Nano · OpenAI Llama 3.1 405B · Meta Gemini 2.0 Flash Lite · Google DeepMind

Benchmarks

Chatbot Arena Elo — Overall HELM — IFEval Aider — Code Editing MMLU Lech Mazur Writing

Related Pages

Anthropic · Provider Anthropic · Economy All Models Compare Models Pricing Developers · API