Hermes 3 is a generalist language model with many improvements over [Hermes 2](/models/nousresearch/nous-hermes-2-mistral-7b-dpo), including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the...
Tested on 6 benchmarks with 38.5% average. Top scores: IFEval (76.6%), BBH (HuggingFace) (53.8%), MMLU-PRO (41.4%).
gpt-oss-120b (free) scores 74.2 (101% as good) at $0.00/1M input · 100% cheaper
HuggingFace MuSR (Multi-Step Reasoning). Tests multi-hop reasoning requiring chaining multiple facts together.
HuggingFace evaluation of MATH Level 5 problems. Competition math requiring advanced reasoning and proof construction.
HuggingFace MMLU-Pro. Harder version of MMLU with 10 answer choices instead of 4 and more challenging questions.
HuggingFace evaluation of GPQA (Graduate-Level Google-Proof Q&A). PhD-level science questions that cannot be easily searched.
- Typetext
- Context131K tokens (~66 books)
- ReleasedAug 2024
- LicenseOpen Source
- StatusActive
- Cost / Message~$0.001