Hermes 2 Pro is an upgraded, retrained version of Nous Hermes 2, consisting of an updated and cleaned version of the OpenHermes 2.5 Dataset, as well as a newly introduced...
Tested on 6 benchmarks with 22.1% average. Top scores: IFEval (53.6%), BBH (HuggingFace) (30.7%), MMLU-PRO (22.8%).
GLM 4 32B scores 37.8 (99% as good) at $0.10/1M input · 29% cheaper
HuggingFace MuSR (Multi-Step Reasoning). Tests multi-hop reasoning requiring chaining multiple facts together.
HuggingFace evaluation of MATH Level 5 problems. Competition math requiring advanced reasoning and proof construction.
HuggingFace MMLU-Pro. Harder version of MMLU with 10 answer choices instead of 4 and more challenging questions.
HuggingFace evaluation of GPQA (Graduate-Level Google-Proof Q&A). PhD-level science questions that cannot be easily searched.
- Typetext
- Context8K tokens (~4 books)
- ReleasedMay 2024
- LicenseOpen Source
- StatusActive
- Cost / Message~$0.000