DeepSeek-V3.1 is a large hybrid reasoning model (671B parameters, 37B active) that supports both thinking and non-thinking modes via prompt templates. It extends the DeepSeek-V3 base with a two-phase long-context...
Tested on 5 benchmarks with 51.1% average. Top scores: Chatbot Arena Elo — Overall (1417.9%), Lech Mazur Writing (85.2%), Fiction.LiveBench (52.8%).
Phi 4 scores 54.2 (101% as good) at $0.07/1M input · 57% cheaper
Unusual and adversarial machine learning challenges. Tests robustness of reasoning about edge cases in ML systems.
Deceptively simple questions that humans find easy but AI models often get wrong. Tests common sense and reasoning gaps.
Writing quality evaluation by Lech Mazur. Tests prose quality, coherence, and stylistic ability.
LiveBench fiction analysis. Tests literary comprehension and creative text understanding.
- Typetext
- Context33K tokens (~16 books)
- ReleasedAug 2025
- LicenseOpen Source
- StatusActive
- Cost / Message~$0.001