Beta
Home/Models/Qwen3 4B Thinking 2507
Alibaba logo

Qwen3 4B Thinking 2507

by Alibaba · Released Aug 2025

Open Source
48.4
avg score
Rank #112
Compare
Better than 52% of all models
Context
N/A
Input $/1M
TBD
Output $/1M
TBD
Type
text-generation
License
Open Source
Benchmarks
6 tested
Data updated today
About

Qwen text generation model. 1217K downloads on HuggingFace.

Tested on 6 benchmarks with 60.6% average. Top scores: OpenCompass — IFEval (88.5%), OpenCompass — AIME2025 (80.0%), OpenCompass — MMLU-Pro (72.8%).

Capabilities
coding
51.6
#62 globally
math
80.0
#19 globally
knowledge
47.8
#110 globally
language
88.5
#18 globally
Benchmark Scores
Compare All
Tested on 6 benchmarks · Ranked across 4 categories
Score Distribution (all 231 models)
0255075100
▲ You are here
OpenCompass — LiveCodeBenchV6

OpenCompass Live Code Bench v6. Fresh competitive programming problems to evaluate code generation without memorization.

51.6
OpenCompass — AIME2025

OpenCompass evaluation on AIME 2025 problems. Tests mathematical reasoning on fresh competition problems.

80.0
OpenCompass — MMLU-Pro

OpenCompass MMLU-Pro evaluation. Harder knowledge test with more answer choices.

72.8
OpenCompass — GPQA-Diamond

OpenCompass evaluation of GPQA Diamond. PhD-level science questions from the hardest subset.

64.7
OpenCompass — HLE

OpenCompass evaluation of Humanitys Last Exam. Expert-level cross-discipline knowledge test.

6.0
Excellent (85+) Good (70-85) Average (50-70) Below (<50)
Links
Documentation
Community
BenchGecko API
qwen-qwen3-4b-thinking-2507
Specifications
  • Typetext-generation
  • ContextN/A
  • ReleasedAug 2025
  • LicenseOpen Source
  • StatusActive
Available On
Alibaba logoAlibabaTBD
Share & Export
Tweet
Qwen3 4B Thinking 2507 is an open-source text-generation AI model by Alibaba, released in August 2025. It has an average benchmark score of 48.4.