Qwen3 235B A22B Instruct 2507
开源来自 Alibaba Qwen · 发布于 2025-07-21
48.5
平均分
$0.07/1M
输入价格
$0.10/1M
输出价格
262K tokens (~131 books)
上下文窗口
text
类型
Tested on 20 benchmarks with 48.5% average. Top scores: Chatbot Arena Elo — Overall (1422.6%), OpenCompass — IFEval (88.3%), OpenCompass — MMLU-Pro (79.2%).
基准测试分数
| 基准测试 | 类别 | 分数 | Bar |
|---|---|---|---|
| Chatbot Arena Elo — Overall | arena | 1422.6 | |
| OpenCompass — IFEval | language | 88.3 | |
| OpenCompass — MMLU-Pro | knowledge | 79.2 | |
| OpenCompass — GPQA-Diamond | knowledge | 75.5 | |
| LiveBench — Coding | coding | 69.6 | |
| OpenCompass — AIME2025 | math | 69.5 | |
| LiveBench — Mathematics | math | 68.0 | |
| LiveBench — Language | language | 66.1 | |
| Aider polyglot | coding | 59.6 | |
| LiveBench — Reasoning | reasoning | 58.4 | |
| Fiction.LiveBench | knowledge | 52.9 | |
| LiveBench — Overall | knowledge | 48.8 | |
| LiveBench — Data Analysis | reasoning | 44.7 | |
| OpenCompass — LiveCodeBenchV6 | coding | 43.0 | |
| WeirdML | coding | 38.7 | |
| LiveBench — If | language | 21.7 | |
| LiveBench — Agentic Coding | coding | 13.3 | |
| OpenCompass — HLE | knowledge | 12.3 | |
| ARC-AGI | reasoning | 11.0 | |
| ARC-AGI-2 | reasoning | 1.3 |
相似模型
HA
Qwen2.5 72B Instruct AbliteratedHuiHui AI
48.1
Google DeepMind
48.0
Google DeepMind
49.1
U
Stable Beluga 2Unknown
47.8