测试版
排行榜/Gemini 1.5 Pro (Feb 2024)
Google DeepMind logo

Gemini 1.5 Pro (Feb 2024)

来自 Google DeepMind · 发布于 2024-01-01

41.3
平均分
N/A
输入价格
N/A
输出价格
N/A
上下文窗口
text
类型

Tested on 20 benchmarks with 41.3% average. Top scores: Chatbot Arena Elo — Overall (1322.5%), HELM — IFEval (83.7%), HELM — WildBench (81.3%).

基准测试类别分数Bar
Chatbot Arena Elo — Overallarena1322.5
HELM — IFEvallanguage83.7
HELM — WildBenchreasoning81.3
BBHreasoning78.7
MMLUknowledge76.9
HELM — MMLU-Proknowledge73.7
VideoMMEmultimodal66.7
Aider — Code Editingcoding57.1
HELM — GPQAknowledge53.4
MATH level 5math40.8
HELM — Omni-MATHmath36.4
CadEvalcoding34.0
GPQA diamondknowledge27.8
WeirdMLcoding22.2
Balrogknowledge21.0
SimpleBenchreasoning12.5
Cybenchcoding7.5
OTIS Mock AIME 2024-2025math6.7
The Agent Companyagentic3.4
ARC-AGI-2reasoning0.8