GLM 4 32B is a cost-effective foundation language model. It can efficiently perform complex tasks and has significantly enhanced capabilities in tool use, online search, and code-related intelligent tasks. It...
Tested on 6 benchmarks with 18.0% average. Top scores: BBH (HuggingFace) (35.8%), MMLU-PRO (34.9%), IFEval (14.3%).
Mistral Nemo scores 37.4 (99% as good) at $0.02/1M input · 80% cheaper
HuggingFace MuSR (Multi-Step Reasoning). Tests multi-hop reasoning requiring chaining multiple facts together.
HuggingFace evaluation of MATH Level 5 problems. Competition math requiring advanced reasoning and proof construction.
HuggingFace MMLU-Pro. Harder version of MMLU with 10 answer choices instead of 4 and more challenging questions.
HuggingFace evaluation of GPQA (Graduate-Level Google-Proof Q&A). PhD-level science questions that cannot be easily searched.
- Typetext
- Context128K tokens (~64 books)
- ReleasedJul 2025
- LicenseProprietary
- StatusActive
- Cost / Message~$0.000