Gemini 1.5 Flash (May 2024)
von Google DeepMind · Veroeffentlicht 2024-01-01
47.4
Durchschn. Score
N/A
Eingabepreis
N/A
Ausgabepreis
N/A
Kontextfenster
text
Typ
Tested on 17 benchmarks with 47.4% average. Top scores: Chatbot Arena Elo — Overall (1285.1%), HELM — IFEval (83.1%), GSM8K (82.4%).
Benchmark-Ergebnisse
| Benchmark | Kategorie | Score | Bar |
|---|---|---|---|
| Chatbot Arena Elo — Overall | arena | 1285.1 | |
| HELM — IFEval | language | 83.1 | |
| GSM8K | math | 82.4 | |
| HELM — WildBench | reasoning | 79.2 | |
| GeoBench | knowledge | 76.0 | |
| PIQA | knowledge | 75.0 | |
| MMLU | knowledge | 70.5 | |
| HELM — MMLU-Pro | knowledge | 67.8 | |
| VideoMME | multimodal | 60.4 | |
| HELM — GPQA | knowledge | 43.7 | |
| HELM — Omni-MATH | math | 30.5 | |
| MATH level 5 | math | 25.1 | |
| WeirdML | coding | 24.9 | |
| GPQA diamond | knowledge | 20.5 | |
| Balrog | knowledge | 14.6 | |
| OTIS Mock AIME 2024-2025 | math | 3.8 | |
| FrontierMath-2025-02-28-Private | math | 0.1 |