50.1
avg score
Rank #108
Better than 54% of all models
Context
N/A
Input $/1M
TBD
Output $/1M
TBD
Type
text
License
Open Source
Benchmarks
7 tested
Data updated today
About
Tested on 7 benchmarks with 60.7% average. Top scores: ARC AI2 (79.2%), LAMBADA (71.1%), GSM8K (61.3%).
Capabilities
reasoning
40.0
#66 globally
math
61.3
#46 globally
knowledge
64.8
#32 globally
Benchmark Scores
Compare AllTested on 7 benchmarks · Ranked across 3 categories
Score Distribution (all 233 models)
0255075100
▲ You are here
reasoningCompare reasoning →
BBH
40.0—BIG-Bench Hard. 23 challenging tasks from BIG-Bench where prior language models fell below average human performance.
mathCompare math →
GSM8K
61.3—Grade school math word problems. 8,500 problems testing multi-step arithmetic reasoning. A foundational math benchmark.
knowledgeCompare knowledge →
ARC AI2
79.2—AI2 Reasoning Challenge. Grade-school science questions requiring multi-step reasoning. Easy and Challenge sets test different difficulty levels.
LAMBADA
71.1—Language modeling benchmark testing ability to predict the last word of passages requiring long-range context understanding.
PIQA
59.8—Physical Intuition QA. Tests understanding of everyday physical interactions and commonsense physics.
Excellent (85+) Good (70-85) Average (50-70) Below (<50)
Links
Research
Documentation
Community
Source Code
BenchGecko API
qwen-14b
Specifications
- Typetext
- ContextN/A
- ReleasedJan 2024
- LicenseOpen Source
- Statusbenchmark-only
Available On
Learn More
Share & Export
Frequently Asked Questions
Qwen-14B is an open-source text AI model by Alibaba Qwen, released in January 2024. It has an average benchmark score of 50.1.