How much does Claude 3 Sonnet cost?

Claude 3 Sonnet pricing information is not yet available.

What benchmarks has Claude 3 Sonnet been tested on?

Claude 3 Sonnet has been evaluated on 6 benchmarks. Top scores: MMLU: 67.9, Winogrande: 50.2, GPQA diamond: 20.8.

Is Claude 3 Sonnet open source?

No, Claude 3 Sonnet is a proprietary model by Anthropic.

How does Claude 3 Sonnet compare to Meta Llama 3 8B Instruct?

Claude 3 Sonnet has an average score of 32.0 while Meta Llama 3 8B Instruct scores 31.7. Claude 3 Sonnet outperforms Meta Llama 3 8B Instruct overall. See full comparison →

Home/Models/Claude 3 Sonnet

Claude 3 Sonnet

Name: Claude 3 Sonnet
Author: Anthropic

by Anthropic · Released Jan 2024

32.0

avg score

Rank #172

Compare

Better than 26% of all models

Context

N/A

Input $/1M

TBD

Output $/1M

TBD

Type

text

License

Proprietary

Benchmarks

6 tested

Data updated today

About

Tested on 6 benchmarks with 28.3% average. Top scores: MMLU (67.9%), Winogrande (50.2%), GPQA diamond (20.8%).

Capabilities

coding

10.2

#137 globally

math

10.3

#186 globally

knowledge

46.3

#119 globally

Benchmark Scores

Compare All

Tested on 6 benchmarks · Ranked across 3 categories

Score Distribution (all 233 models)

0255075100

▲ You are here

codingCompare coding →

WeirdML

Unusual and adversarial machine learning challenges. Tests robustness of reasoning about edge cases in ML systems.

10.2—

mathCompare math →

MATH level 5

Competition-level math from AMC, AIME, and olympiad problems. Level 5 is the hardest tier, requiring creative problem-solving.

18.2—

OTIS Mock AIME 2024-2025

Mock AIME (American Invitational Mathematics Exam) problems from OTIS. Tests mathematical competition performance.

2.4—

knowledgeCompare knowledge →

MMLU

Massive Multitask Language Understanding. 57 subjects from STEM, humanities, and social sciences. The most widely-cited knowledge benchmark.

67.9—

Winogrande

Commonsense coreference resolution. Tests understanding of pronoun references in ambiguous sentences.

50.2—

GPQA diamond

Graduate-level science questions written by PhD experts. Diamond subset contains questions where experts disagree, testing deep understanding.

20.8—

Quick compare:

vs Meta Llama 3 8B Instruct

vs o1-mini

vs Devstral 2 2512

Excellent (85+) Good (70-85) Average (50-70) Below (<50)

Model Family · Anthropic Claude Sonnet

Claude 3 SonnetJan 2024

28.3

N/AN/Actx6 benchmarks

Claude 3.5 SonnetJan 2024

42.3+14.0

N/AN/Actx25 benchmarks

Claude 3.7 SonnetFeb 2025

47.7+5.4

$3.00/M in200Kctx26 benchmarks

Claude 3.7 Sonnet (thinking)Feb 2025

$3.00/M in200Kctx

Claude Sonnet 4May 2025

44.6+44.6

$3.00/M in1.0Mctx(+800K)27 benchmarks

Claude Sonnet 4.5Sep 2025

42.1-2.5

$3.00/M in1.0Mctx21 benchmarks

Claude Sonnet 4.6Feb 2026

47.6+5.5

$3.00/M in1.0Mctx18 benchmarks

See the full Claude Sonnet family →

Similar Models

Meta Llama 3 8B Instruct

Frequently Asked Questions

Claude 3 Sonnet is a proprietary text AI model by Anthropic, released in January 2024. It has an average benchmark score of 32.0.

Benchmarks

MMLU Winogrande GPQA diamond MATH level 5 WeirdML

Anthropic · Provider Anthropic · Economy All Models Compare Models Pricing Developers · API

Claude 3 Sonnet

Frequently Asked Questions

Related Models

Benchmarks

Related Pages