Gemini 2.5 Flash
開発元 Google DeepMind · リリース日 2025-06-17
40.0
平均スコア
$0.30/1M
入力料金
$2.50/1M
出力料金
1.0M tokens (~524 books)
コンテキストウィンドウ
multimodal
タイプ
Tested on 25 benchmarks with 40.0% average. Top scores: Chatbot Arena Elo — Overall (1411.0%), HELM — IFEval (89.8%), HELM — WildBench (81.7%).
ベンチマークスコア
| ベンチマーク | カテゴリ | スコア | Bar |
|---|---|---|---|
| Chatbot Arena Elo — Overall | arena | 1411.0 | |
| HELM — IFEval | language | 89.8 | |
| HELM — WildBench | reasoning | 81.7 | |
| Lech Mazur Writing | knowledge | 76.5 | |
| OTIS Mock AIME 2024-2025 | math | 73.0 | |
| GeoBench | knowledge | 73.0 | |
| HELM — MMLU-Pro | knowledge | 63.9 | |
| Fiction.LiveBench | knowledge | 47.2 | |
| Aider polyglot | coding | 47.1 | |
| The Agent Company | agentic | 41.1 | |
| WeirdML | coding | 41.0 | |
| AudioMultiChallenge | knowledge | 40.0 | |
| AudioMultiChallenge — Text Output | knowledge | 40.0 | |
| HELM — GPQA | knowledge | 39.0 | |
| HELM — Omni-MATH | math | 38.4 | |
| Balrog | knowledge | 33.5 | |
| ARC-AGI | reasoning | 32.3 | |
| SimpleBench | reasoning | 29.4 | |
| DeepResearch Bench | knowledge | 29.2 | |
| Terminal Bench | coding | 17.1 | |
| HLE | knowledge | 7.7 | |
| VPCT | knowledge | 7.0 | |
| FrontierMath-2025-02-28-Private | math | 4.8 | |
| FrontierMath-Tier-4-2025-07-01-Private | math | 4.2 | |
| ARC-AGI-2 | reasoning | 2.5 |