DeepSeek-V3.1 is a large hybrid reasoning model (671B parameters, 37B active) that supports both thinking and non-thinking modes via prompt templates. It extends the DeepSeek-V3 base with a two-phase long-context...
Tested on 5 benchmarks with 51.1% average. Top scores: Chatbot Arena Elo — Overall (1417.4%), Lech Mazur Writing (85.2%), Fiction.LiveBench (52.8%).
Qwen3 32B scores 51.7 (100% as good) at $0.08/1M input · 62% cheaper
Unusual and adversarial machine learning challenges. Tests robustness of reasoning about edge cases in ML systems.
Deceptively simple questions that humans find easy but AI models often get wrong. Tests common sense and reasoning gaps.
Writing quality evaluation by Lech Mazur. Tests prose quality, coherence, and stylistic ability.
LiveBench fiction analysis. Tests literary comprehension and creative text understanding.
- Typetext
- Context164K tokens (~82 books)
- ReleasedAug 2025
- LicenseOpen Source
- StatusActive
- Cost / Message~$0.001