A 12B parameter model with a 128k token context length built by Mistral in collaboration with NVIDIA. The model is multilingual, supporting English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese,...
Tested on 5 benchmarks with 37.2% average. Top scores: GSM8K (84.2%), PIQA (67.0%), Balrog (17.6%).
Grade school math word problems. 8,500 problems testing multi-step arithmetic reasoning. A foundational math benchmark.
Competition-level math from AMC, AIME, and olympiad problems. Level 5 is the hardest tier, requiring creative problem-solving.
Physical Intuition QA. Tests understanding of everyday physical interactions and commonsense physics.
Broad Assessment of Language and Reasoning Over Games. Tests strategic and logical reasoning through game scenarios.
Graduate-level science questions written by PhD experts. Diamond subset contains questions where experts disagree, testing deep understanding.
- Typetext
- Context131K tokens (~66 books)
- ReleasedJul 2024
- LicenseOpen Source
- StatusActive
- Cost / Message~$0.000