Hallucination
When a model generates confident-sounding output that is factually wrong, fabricated, or unsupported by reality.
When a model generates confident-sounding output that is factually wrong, fabricated, or unsupported by reality.
Basic
Hallucinations range from subtle factual errors (wrong date, wrong number) to entirely invented citations or events. Every LLM does it, at some rate. Mitigations include: RAG (ground responses in retrieved documents), RLHF (train to prefer honest answers), reasoning models (extended thinking catches errors), and verification layers (secondary model or tool checks the output).
Deep
Hallucination root cause: LLMs predict next-token probability, not truth. When the model lacks information, it predicts plausible continuations based on pattern-matching, producing output that looks right but isn't. Rates vary widely: GPT-5 ≤1% hallucination rate on factual tasks with good prompting. Older models: 10-20%+. Mitigations rank: RAG > reasoning + verification > RLHF > prompt engineering. Hallucination is hardest in: obscure facts, numeric details, citations, legal/medical advice, code that uses deprecated APIs.
Expert
Measured on TruthfulQA (817 adversarial questions designed to elicit common misconceptions), HaluBench, and custom evals. RAG reduces hallucination 3-10× when retrieved context is relevant; near zero reduction when retrieval fails. Self-consistency (sample k outputs, pick majority) reduces some types. Reasoning models with extended thinking catch more errors via internal self-critique. Production systems stack: RAG retriever → verifier → reasoning model → citation extractor. Total hallucination rate for well-engineered pipelines: <0.5% on factual queries.
Depending on why you're here
- ·Next-token prediction optimizes plausibility, not truth
- ·TruthfulQA, HaluBench measure adversarially
- ·RAG reduces 3-10× when retrieval works
- ·Ship with RAG for any factual workload
- ·Verify with a secondary model or tool for high-stakes outputs
- ·Citations force the model to ground claims · reduces confabulation
- ·Hallucination is the #1 blocker for enterprise AI in regulated industries
- ·Grounding infra (RAG, verification) is a distinct moat · not model quality
- ·Every frontier lab is racing to close the last 1%
- ·When AI confidently makes things up
- ·Every AI does it · newer models do it less
- ·Why you should fact-check AI-generated content
Often confused with
RAG is the mitigation; hallucination is the problem. RAG grounds answers in real documents so the model has source material to cite.
Hallucination is a solved problem for well-engineered pipelines. 90% of "AI made it up" stories in 2026 are poorly architected systems, not frontier model limits.