This is a series of models designed to replicate the prose quality of the Claude 3 models, specifically Sonnet(https://openrouter.ai/anthropic/claude-3.5-sonnet) and Opus(https://openrouter.ai/anthropic/claude-3-opus). The model is fine-tuned on top of [Qwen2.5 72B](https://openrouter.ai/qwen/qwen-
Tested on 6 benchmarks with 27.9% average. Top scores: IFEval (56.3%), BBH (HuggingFace) (35.5%), MMLU-PRO (31.4%).
Qwen3 32B scores 51.7 (101% as good) at $0.08/1M input · 97% cheaper
HuggingFace MuSR (Multi-Step Reasoning). Tests multi-hop reasoning requiring chaining multiple facts together.
HuggingFace evaluation of MATH Level 5 problems. Competition math requiring advanced reasoning and proof construction.
HuggingFace MMLU-Pro. Harder version of MMLU with 10 answer choices instead of 4 and more challenging questions.
HuggingFace evaluation of GPQA (Graduate-Level Google-Proof Q&A). PhD-level science questions that cannot be easily searched.
- Typetext
- Context16K tokens (~8 books)
- ReleasedOct 2024
- LicenseOpen Source
- StatusActive
- Cost / Message~$0.011