Better than 15% of all models
Context
N/A
Input $/1M
TBD
Output $/1M
TBD
Type
text
License
Proprietary
Benchmarks
4 tested
Data updated today
About
Tested on 4 benchmarks with 21.0% average. Top scores: MMLU (64.7%), GPQA diamond (10.6%), WeirdML (7.1%).
Capabilities
coding
7.1
#140 globally
math
1.9
#206 globally
knowledge
37.6
#155 globally
Benchmark Scores
Compare AllTested on 4 benchmarks · Ranked across 3 categories
Score Distribution (all 233 models)
0255075100
▲ You are here
codingCompare coding →
WeirdML
7.1—Unusual and adversarial machine learning challenges. Tests robustness of reasoning about edge cases in ML systems.
mathCompare math →
OTIS Mock AIME 2024-2025
1.9—Mock AIME (American Invitational Mathematics Exam) problems from OTIS. Tests mathematical competition performance.
knowledgeCompare knowledge →
MMLU
64.7—Massive Multitask Language Understanding. 57 subjects from STEM, humanities, and social sciences. The most widely-cited knowledge benchmark.
GPQA diamond
10.6—Graduate-level science questions written by PhD experts. Diamond subset contains questions where experts disagree, testing deep understanding.
Excellent (85+) Good (70-85) Average (50-70) Below (<50)
Links
Research
Documentation
Community
BenchGecko API
claude-2-1
Specifications
- Typetext
- ContextN/A
- ReleasedJan 2024
- LicenseProprietary
- Statusbenchmark-only
Available On
Learn More
Share & Export
Frequently Asked Questions
Claude 2.1 is a proprietary text AI model by Anthropic, released in January 2024. It has an average benchmark score of 24.0.