Mistral Large 2 2411 is an update of [Mistral Large 2](/mistralai/mistral-large) released together with [Pixtral Large 2411](/mistralai/pixtral-large-2411) It provides a significant upgrade on the previous [Mistral Large 24.07](/mistralai/mistral-large-2407), with notable...
Tested on 11 benchmarks with 45.8% average. Top scores: Chatbot Arena Elo — Overall (1304.7%), HELM — IFEval (87.6%), HELM — WildBench (80.1%).
Gemma 3 27B (free) scores 35.0 (98% as good) at $0.00/1M input · 100% cheaper
Code editing benchmark from the Aider project. Measures ability to apply targeted code changes while maintaining correctness and style.
Stanford HELM WildBench evaluation. Tests reasoning on challenging real-world tasks.
Competition-level math from AMC, AIME, and olympiad problems. Level 5 is the hardest tier, requiring creative problem-solving.
Stanford HELM evaluation of mathematical reasoning across diverse problem types.
Mock AIME (American Invitational Mathematics Exam) problems from OTIS. Tests mathematical competition performance.
- Typetext
- Context131K tokens (~66 books)
- ReleasedNov 2024
- LicenseOpen Source
- StatusActive
- Cost / Message~$0.010