Claude 3.7 Sonnet (thinking)
par Anthropic · Sorti le 2025-02-24
42.1
score moyen
$3.00/1M
Prix d'entrée
$15.00/1M
Prix de sortie
200K tokens (~100 books)
Fenêtre de contexte
multimodal
Type
Tested on 20 benchmarks with 42.1% average. Top scores: MATH level 5 (91.2%), Fiction.LiveBench (83.3%), Lech Mazur Writing (81.1%).
Scores de benchmark
| Benchmark | Catégorie | Score | Bar |
|---|---|---|---|
| MATH level 5 | math | 91.2 | |
| Fiction.LiveBench | knowledge | 83.3 | |
| Lech Mazur Writing | knowledge | 81.1 | |
| GPQA diamond | knowledge | 73.0 | |
| GeoBench | knowledge | 68.0 | |
| Aider polyglot | coding | 64.9 | |
| OTIS Mock AIME 2024-2025 | math | 57.7 | |
| CadEval | coding | 54.0 | |
| SWE-Bench Verified (Bash Only) | coding | 52.8 | |
| DeepResearch Bench | knowledge | 43.6 | |
| OSWorld | agentic | 35.8 | |
| SimpleBench | reasoning | 35.7 | |
| The Agent Company | agentic | 30.9 | |
| ARC-AGI | reasoning | 28.6 | |
| Cybench | coding | 20.0 | |
| VPCT | knowledge | 8.5 | |
| FrontierMath-2025-02-28-Private | math | 4.1 | |
| GSO-Bench | coding | 3.8 | |
| HLE | knowledge | 3.4 | |
| ARC-AGI-2 | reasoning | 0.9 |
Modèles similaires
Anthropic
42.1
Google DeepMind
42.2
Google DeepMind
42.2
Google DeepMind
42.2