Home/Models/DeepSeek V3
DeepSeek logo

DeepSeek V3

by DeepSeek · Released Dec 2024

Open Source
58.3
avg score
Rank #77
Compare
Better than 67% of all models
Context
164K tokens (~82 books)
Input $/1M
$0.32
Output $/1M
$0.89
Type
text
License
Open Source
Benchmarks
22 tested
Data updated today
About

DeepSeek-V3 is the latest model from the DeepSeek team, building upon the instruction following and coding abilities of the previous versions. Pre-trained on nearly 15 trillion tokens, the reported evaluations...

Tested on 22 benchmarks with 59.0% average. Top scores: Chatbot Arena Elo — Overall (1358.2%), ARC AI2 (93.7%), HellaSwag (85.2%).

Looking for similar performance at lower cost?
Qwen3 Next 80B A3B Thinking scores 57.5 (99% as good) at $0.10/1M input · 70% cheaper
Capabilities
coding
42.2
#93 globally
reasoning
56.4
#39 globally
math
30.7
#131 globally
knowledge
70.9
#15 globally
language
83.2
#41 globally
Benchmark Scores
Compare All
Tested on 22 benchmarks · Ranked across 6 categories
Score Distribution (all 233 models)
0255075100
▲ You are here
Aider polyglot

Multi-language code editing from Aider. Tests editing ability across Python, JavaScript, TypeScript, Java, C++, Go, Rust, and more.

48.4
WeirdML

Unusual and adversarial machine learning challenges. Tests robustness of reasoning about edge cases in ML systems.

36.1
BBH

BIG-Bench Hard. 23 challenging tasks from BIG-Bench where prior language models fell below average human performance.

83.3
HELM — WildBench

Stanford HELM WildBench evaluation. Tests reasoning on challenging real-world tasks.

83.1
SimpleBench

Deceptively simple questions that humans find easy but AI models often get wrong. Tests common sense and reasoning gaps.

2.7
MATH level 5

Competition-level math from AMC, AIME, and olympiad problems. Level 5 is the hardest tier, requiring creative problem-solving.

64.8
HELM — Omni-MATH

Stanford HELM evaluation of mathematical reasoning across diverse problem types.

40.3
OTIS Mock AIME 2024-2025

Mock AIME (American Invitational Mathematics Exam) problems from OTIS. Tests mathematical competition performance.

15.8
Excellent (85+) Good (70-85) Average (50-70) Below (<50)
Links
Documentation
Community
BenchGecko API
deepseek-chat
Specifications
  • Typetext
  • Context164K tokens (~82 books)
  • ReleasedDec 2024
  • LicenseOpen Source
  • StatusActive
  • Cost / Message~$0.002
Available On
DeepSeek logoDeepSeek$0.32
Share & Export
Tweet
DeepSeek V3 is an open-source text AI model by DeepSeek, released in December 2024. It has an average benchmark score of 58.3. Context window: 164K tokens.