Qwen2.5 72B Instruct
オープンソース開発元 Alibaba Qwen · リリース日 2024-09-19
53.2
平均スコア
$0.12/1M
入力料金
$0.39/1M
出力料金
33K tokens (~16 books)
コンテキストウィンドウ
text
タイプ
Tested on 24 benchmarks with 53.2% average. Top scores: Chatbot Arena Elo — Overall (1302.3%), ARC AI2 (92.7%), IFEval (86.4%).
ベンチマークスコア
| ベンチマーク | カテゴリ | スコア | Bar |
|---|---|---|---|
| Chatbot Arena Elo — Overall | arena | 1302.3 | |
| ARC AI2 | knowledge | 92.7 | |
| IFEval | language | 86.4 | |
| CMMLU | knowledge | 85.7 | |
| MMLU | knowledge | 80.4 | |
| HellaSwag | knowledge | 79.7 | |
| BBH | reasoning | 73.1 | |
| TriviaQA | knowledge | 71.9 | |
| Aider — Code Editing | coding | 65.4 | |
| PIQA | knowledge | 65.2 | |
| VideoMME | multimodal | 64.7 | |
| Winogrande | knowledge | 64.6 | |
| MATH level 5 | math | 63.2 | |
| GeoBench | knowledge | 62.0 | |
| BBH (HuggingFace) | general | 61.9 | |
| MATH Level 5 | math | 59.8 | |
| MMLU-PRO | knowledge | 51.4 | |
| GPQA diamond | knowledge | 32.2 | |
| GPQA | knowledge | 16.7 | |
| Balrog | knowledge | 16.2 | |
| MUSR | reasoning | 11.7 | |
| OTIS Mock AIME 2024-2025 | math | 8.0 | |
| The Agent Company | agentic | 5.7 | |
| OSWorld | agentic | 5.0 |
類似モデル
DeepSeek
53.2
OpenAI
53.2
moonshotai
53.3
Alibaba Qwen
53.1