Qwen3-8B is a dense 8.2B parameter causal language model from the Qwen3 series, designed for both reasoning-heavy tasks and efficient dialogue. It supports seamless switching between "thinking" mode for math,...
Tested on 6 benchmarks with 56.5% average. Top scores: OpenCompass — IFEval (85.6%), OpenCompass — MMLU-Pro (72.1%), OpenCompass — AIME2025 (66.2%).
OpenCompass Live Code Bench v6. Fresh competitive programming problems to evaluate code generation without memorization.
OpenCompass evaluation on AIME 2025 problems. Tests mathematical reasoning on fresh competition problems.
OpenCompass MMLU-Pro evaluation. Harder knowledge test with more answer choices.
OpenCompass evaluation of GPQA Diamond. PhD-level science questions from the hardest subset.
OpenCompass evaluation of Humanitys Last Exam. Expert-level cross-discipline knowledge test.
- Typetext
- Context41K tokens (~20 books)
- ReleasedApr 2025
- LicenseOpen Source
- StatusActive
- Cost / Message~$0.001