Qwen3 235B A22B Instruct 2507
Open Sourcedi Alibaba Qwen · Rilascio 2025-07-21
48.5
punteggio medio
$0.07/1M
Prezzo Input
$0.10/1M
Prezzo Output
262K tokens (~131 books)
Finestra di Contesto
text
Tipo
Tested on 20 benchmarks with 48.5% average. Top scores: Chatbot Arena Elo — Overall (1422.6%), OpenCompass — IFEval (88.3%), OpenCompass — MMLU-Pro (79.2%).
Punteggi Benchmark
| Benchmark | Categoria | Punteggio | Bar |
|---|---|---|---|
| Chatbot Arena Elo — Overall | arena | 1422.6 | |
| OpenCompass — IFEval | language | 88.3 | |
| OpenCompass — MMLU-Pro | knowledge | 79.2 | |
| OpenCompass — GPQA-Diamond | knowledge | 75.5 | |
| LiveBench — Coding | coding | 69.6 | |
| OpenCompass — AIME2025 | math | 69.5 | |
| LiveBench — Mathematics | math | 68.0 | |
| LiveBench — Language | language | 66.1 | |
| Aider polyglot | coding | 59.6 | |
| LiveBench — Reasoning | reasoning | 58.4 | |
| Fiction.LiveBench | knowledge | 52.9 | |
| LiveBench — Overall | knowledge | 48.8 | |
| LiveBench — Data Analysis | reasoning | 44.7 | |
| OpenCompass — LiveCodeBenchV6 | coding | 43.0 | |
| WeirdML | coding | 38.7 | |
| LiveBench — If | language | 21.7 | |
| LiveBench — Agentic Coding | coding | 13.3 | |
| OpenCompass — HLE | knowledge | 12.3 | |
| ARC-AGI | reasoning | 11.0 | |
| ARC-AGI-2 | reasoning | 1.3 |
Modelli Simili
HA
Qwen2.5 72B Instruct AbliteratedHuiHui AI
48.1
Google DeepMind
48.0
Google DeepMind
49.1
U
Stable Beluga 2Unknown
47.8