Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). Qwen2.5-Coder brings the following improvements upon CodeQwen1.5: - Significantly improvements in **code generation**, **code reasoning**...
Tested on 14 benchmarks with 53.1% average. Top scores: Chatbot Arena Elo — Overall (1269.9%), GSM8K (91.1%), HellaSwag (77.3%).
MiniMax M2 scores 72.4 (102% as good) at $0.26/1M input · 61% cheaper
Code editing benchmark from the Aider project. Measures ability to apply targeted code changes while maintaining correctness and style.
Multi-language code editing from Aider. Tests editing ability across Python, JavaScript, TypeScript, Java, C++, Go, Rust, and more.
HuggingFace MuSR (Multi-Step Reasoning). Tests multi-hop reasoning requiring chaining multiple facts together.
Grade school math word problems. 8,500 problems testing multi-step arithmetic reasoning. A foundational math benchmark.
HuggingFace evaluation of MATH Level 5 problems. Competition math requiring advanced reasoning and proof construction.
- Typetext
- Context33K tokens (~16 books)
- ReleasedNov 2024
- LicenseOpen Source
- StatusActive
- Cost / Message~$0.002