AI & LLMsUpdated April 27, 2026

AI hallucination

Definition

AI hallucination is when a large language model generates content that sounds plausible and confident but is factually wrong, fabricated, or unverifiable — invented citations, made-up statistics, or fictional events presented with the same fluency as accurate information. Hallucination is a structural feature of how LLMs work, not a bug that can be fully eliminated.

How it works

Large language models are prediction engines, not knowledge bases. They generate text by predicting the most statistically likely next token based on patterns learned during training. They do not "understand" truth; they predict plausibility. When the model encounters a gap in its training data or an ambiguous query, it fills the gap with plausible-sounding fabrication rather than admitting uncertainty.

Researchers distinguish two main types. Intrinsic hallucinations contradict source material the model was given — for example, summarizing a document and inventing a fact that is not in it. Extrinsic hallucinations invent information that cannot be verified against any source — fabricated case law, made-up product specs, fake statistics. Extrinsic hallucinations are harder to detect because there is no original to compare against.

Why it matters

Hallucination rates remain high enough to cause real harm. Stanford RegLab research found that LLMs hallucinate between 69% and 88% of the time on specific legal queries, with at least 75% hallucination rates on questions about a court's core ruling. ECRI, a global healthcare safety nonprofit, listed AI risks as the top health technology hazard for 2025. Courts have imposed sanctions exceeding $10,000 in multiple cases involving AI-fabricated citations submitted by practicing lawyers.

For brands, hallucination affects how AI systems describe your products, attribute your statistics, and represent your positions in generated answers. A confidently wrong AI summary reaches more buyers than an accurate one buried below it.

0.7%–25%+

Hallucination rate range across leading models on grounded summarization

Vectara HHEM Leaderboard

69–88%

Hallucination rate on specific legal queries across major LLMs

Stanford RegLab/HAI study

71%

Maximum hallucination reduction from properly implemented RAG

Industry analysis, 2026

How to measure it

The Vectara Hughes Hallucination Evaluation Model (HHEM) Leaderboard is the industry's most widely referenced benchmark. It measures grounded hallucination — how often a model introduces false information when summarizing a document it was explicitly given. Vectara provides 1,000+ documents to each model, asks for summaries using only facts from the source, then checks each summary against the original.

Other benchmarks measure extrinsic hallucination by asking models factual questions across diverse domains and checking responses against verified ground truth. Domain-specific evaluations matter: legal hallucination rates run 17% to 34% even on purpose-built legal AI tools, while general-knowledge rates on top models can fall below 1%.

How to reduce it

Five techniques have measured impact:

Retrieval-augmented generation (RAG). Grounding model responses in retrieved documents at query time reduces hallucination by up to 71% in studies. RAG is the foundation of every serious enterprise AI deployment.
Structured prompts. Instructions like "Cite your sources" or "Say I don't know if uncertain" reduce hallucination rates by 20% to 40% in benchmarks.
Reasoning models. Models that perform step-by-step reasoning before responding hallucinate less on complex tasks, though some reasoning models — notably OpenAI's o3 — show higher hallucination rates on person-specific questions.
Multi-model verification. Running the same query across multiple models and flagging disagreements catches errors no single model surfaces. Different models hallucinate different facts, so consensus signals reliability.
Human review for high-stakes content. No technique eliminates hallucination entirely. A 2025 mathematical proof showed that hallucinations are structurally inevitable under current LLM architectures.

Frequently asked questions

Why do LLMs hallucinate?

Because they are statistical prediction engines, not knowledge bases. They generate the most plausible next token based on patterns from training data. When the model lacks information or faces ambiguity, it produces something fluent rather than admitting uncertainty. Current training and evaluation practices reward confident answers over accurate ones.

Which AI models hallucinate the most?

It depends entirely on the task. On grounded summarization benchmarks, leading models like Gemini 2.0 Flash and Claude Sonnet achieve sub-1% rates while older or smaller open-source models can exceed 25%. On legal and medical queries, even top models hallucinate at much higher rates. There is no single "best" model — performance varies significantly by domain.

Can hallucination be eliminated?

No. A 2025 mathematical proof established that hallucinations are structurally inevitable under current LLM architectures. Models generate statistically probable responses based on pattern matching, not verified retrieval. RAG, reasoning improvements, and human review reduce hallucination but cannot eliminate it.

How does hallucination affect AI search and GEO?

AI systems can misattribute statistics to your brand, paraphrase your content inaccurately, or omit citations to sources they actually used. Tracking how AI describes your brand — through tools like Indexly that monitor responses across ChatGPT, Claude, Perplexity, and Gemini — is now part of brand reputation management, not just analytics.

Do reasoning models hallucinate less?

Sometimes. Reasoning models perform better on complex math and coding tasks where step-by-step thinking helps. But on person-specific factual questions, reasoning models like OpenAI's o3 have shown higher hallucination rates than non-reasoning predecessors — the extended reasoning gives them more room to generate confident-but-wrong content.

Retrieval-augmented generation (RAG)

Retrieval-augmented generation (RAG) is an AI architecture that gives a large language model real-time access to external documents at query time — retrieving relevant passages from a vector database or search index and inserting them into the model's context before it generates a response. RAG is the foundation of modern AI search and the most effective technique for reducing hallucination.

AI grounding

AI grounding is the practice of anchoring an LLM's response in retrieved, citable sources at inference time — instead of letting the model rely solely on its training memory. Grounding is what separates a hallucination-prone chatbot from a search-grade AI assistant like Perplexity, Google AI Overviews, Bing Chat, or retrieval-augmented ChatGPT.