测试版
排行榜/DeepSeek V3
DeepSeek logo

DeepSeek V3

开源

来自 DeepSeek · 发布于 2024-12-26

59.0
平均分
$0.32/1M
输入价格
$0.89/1M
输出价格
164K tokens (~82 books)
上下文窗口
text
类型

Tested on 22 benchmarks with 59.0% average. Top scores: Chatbot Arena Elo — Overall (1358.2%), ARC AI2 (93.7%), HellaSwag (85.2%).

基准测试类别分数Bar
Chatbot Arena Elo — Overallarena1358.2
ARC AI2knowledge93.7
HellaSwagknowledge85.2
BBHreasoning83.3
HELM — IFEvallanguage83.2
HELM — WildBenchreasoning83.1
MMLUknowledge82.9
TriviaQAknowledge82.9
Lech Mazur Writingknowledge77.0
HELM — MMLU-Proknowledge72.3
Winograndeknowledge70.4
PIQAknowledge69.4
MATH level 5math64.8
HELM — GPQAknowledge53.8
Fiction.LiveBenchknowledge50.0
Aider polyglotcoding48.4
GPQA diamondknowledge42.0
HELM — Omni-MATHmath40.3
WeirdMLcoding36.1
OTIS Mock AIME 2024-2025math15.8
SimpleBenchreasoning2.7
FrontierMath-2025-02-28-Privatemath1.7