Qwen2.5-Coder-7B-Instruct is a 7B parameter instruction-tuned language model optimized for code-related tasks such as code generation, reasoning, and bug fixing. Based on the Qwen2.5 architecture, it incorporates enhancements like RoPE,...
Tested on 12 benchmarks with 44.4% average. Top scores: GSM8K (86.7%), HellaSwag (69.1%), IFEval (61.0%).
Code editing benchmark from the Aider project. Measures ability to apply targeted code changes while maintaining correctness and style.
HuggingFace MuSR (Multi-Step Reasoning). Tests multi-hop reasoning requiring chaining multiple facts together.
Grade school math word problems. 8,500 problems testing multi-step arithmetic reasoning. A foundational math benchmark.
HuggingFace evaluation of MATH Level 5 problems. Competition math requiring advanced reasoning and proof construction.
- Typetext
- Context33K tokens (~16 books)
- ReleasedApr 2025
- LicenseOpen Source
- StatusActive
- Cost / Message~$0.000