How much does Claude Sonnet 4 cost?

Claude Sonnet 4 costs $3.00 per million input tokens and $15.00 per million output tokens. For a typical conversation (~2,000 tokens), that's approximately $0.021 per message.

What benchmarks has Claude Sonnet 4 been tested on?

Claude Sonnet 4 has been evaluated on 27 benchmarks. Top scores: MASK: 95.3, OpenCompass — IFEval: 88.3, MATH level 5: 84.4.

Is Claude Sonnet 4 open source?

No, Claude Sonnet 4 is a proprietary model by Anthropic.

How does Claude Sonnet 4 compare to gpt-oss-120b?

Claude Sonnet 4 has an average score of 43.7 while gpt-oss-120b scores 43.7. gpt-oss-120b slightly outperforms Claude Sonnet 4 overall. Claude Sonnet 4 costs $3.00/1M input vs gpt-oss-120b at $0.04/1M input. See full comparison →

Home/Models/Claude Sonnet 4

Claude Sonnet 4

Name: Claude Sonnet 4
Price: 3 USD
Author: Anthropic

by Anthropic · Released May 2025

Multimodal1M Context

43.7

avg score

Rank #130

Compare

Better than 44% of all models

Context

1.0M tokens (~500 books)

Input $/1M

$3.00

Output $/1M

$15.00

Type

multimodal

License

Proprietary

Benchmarks

27 tested

Data updated today

About

Claude Sonnet 4 significantly enhances the capabilities of its predecessor, Sonnet 3.7, excelling in both coding and reasoning tasks with improved precision and controllability. Achieving state-of-the-art performance on SWE-bench (72.7%),...

Tested on 27 benchmarks with 44.6% average. Top scores: MASK (95.3%), OpenCompass — IFEval (88.3%), MATH level 5 (84.4%).

Looking for similar performance at lower cost?
gpt-oss-120b scores 43.7 (100% as good) at $0.04/1M input · 99% cheaper

Capabilities

coding

43.3

#91 globally

reasoning

26.8

#93 globally

math

45.7

#92 globally

knowledge

41.6

#142 globally

agentic

38.5

#10 globally

safety

95.3

#2 globally

language

88.3

#20 globally

Benchmark Scores

Compare All

Tested on 27 benchmarks · Ranked across 7 categories

Score Distribution (all 233 models)

0255075100

▲ You are here

codingCompare coding →

SWE-Bench Verified (Bash Only)

SWE-bench Verified solved using only bash commands, no specialized frameworks. Tests raw terminal-based problem solving.

64.9—

Aider polyglot

Multi-language code editing from Aider. Tests editing ability across Python, JavaScript, TypeScript, Java, C++, Go, Rust, and more.

61.3—

OpenCompass — LiveCodeBenchV6

OpenCompass Live Code Bench v6. Fresh competitive programming problems to evaluate code generation without memorization.

47.5—

reasoningCompare reasoning →

ARC-AGI

Abstraction and Reasoning Corpus. Tests fluid intelligence through novel visual pattern recognition puzzles. Core measure of general intelligence.

40.0—

SimpleBench

Deceptively simple questions that humans find easy but AI models often get wrong. Tests common sense and reasoning gaps.

34.6—

ARC-AGI-2

ARC-AGI 2, harder sequel to ARC. More complex abstract reasoning patterns that test generalization ability beyond training data.

5.9—

mathCompare math →

MATH level 5

Competition-level math from AMC, AIME, and olympiad problems. Level 5 is the hardest tier, requiring creative problem-solving.

84.4—

OTIS Mock AIME 2024-2025

Mock AIME (American Invitational Mathematics Exam) problems from OTIS. Tests mathematical competition performance.

71.1—

OpenCompass — AIME2025

OpenCompass evaluation on AIME 2025 problems. Tests mathematical reasoning on fresh competition problems.

68.7—

Quick compare:

vs gpt-oss-120b

vs Llama 3.1 405B

vs Gemma 2 2b It

Excellent (85+) Good (70-85) Average (50-70) Below (<50)

Model Family · Anthropic Claude Sonnet

Claude 3 SonnetJan 2024

28.3

N/AN/Actx6 benchmarks

Claude 3.5 SonnetJan 2024

42.3+14.0

N/AN/Actx25 benchmarks

Claude 3.7 SonnetFeb 2025

47.7+5.4

$3.00/M in200Kctx26 benchmarks

Claude 3.7 Sonnet (thinking)Feb 2025

$3.00/M in200Kctx

Claude Sonnet 4May 2025

44.6+44.6

$3.00/M in1.0Mctx(+800K)27 benchmarks

Claude Sonnet 4.5Sep 2025

42.1-2.5

$3.00/M in1.0Mctx21 benchmarks

Claude Sonnet 4.6Feb 2026

47.6+5.5

$3.00/M in1.0Mctx18 benchmarks

See the full Claude Sonnet family →

Similar Models

Frequently Asked Questions

Claude Sonnet 4 is a proprietary multimodal AI model by Anthropic, released in May 2025. It has an average benchmark score of 43.7. Context window: 1M tokens.

Benchmarks

MASK OpenCompass — IFEval MATH level 5 OpenCompass — MMLU-Pro OpenCompass — GPQA-Diamond

Anthropic · Provider Anthropic · Economy All Models Compare Models Pricing Developers · API Context Window · Glossary

Claude Sonnet 4

Frequently Asked Questions

Related Models

Benchmarks

Related Pages