Citation probability
Definition
Citation probability is the likelihood that an AI system will cite a specific URL when generating a response to a target prompt. Unlike share of model, which measures brand visibility across a prompt set, citation probability is a per-URL metric — it tells you how strong an individual page is at earning citations.
How it works
AI systems with retrieval-augmented generation — Perplexity, Google AI Overviews, ChatGPT search, Claude with web search — go through a multi-step pipeline when answering a prompt: retrieve a candidate set of documents, rank them by relevance and authority, select a subset to ground the answer in, and finally cite the sources that contributed most directly to the synthesized response.
Citation probability is the empirical likelihood that a specific URL survives all four stages for a specific query. It depends on how well the page matches the query's semantic intent, how authoritative the source is judged to be, how recently the content was updated, and whether the answer is structurally easy to extract.
How to calculate it
The basic formula:
Citation probability = (number of times URL is cited) ÷ (number of prompt runs) × 100
For meaningful results, run each target prompt at least 5 to 10 times across each AI platform — outputs vary across runs because of model sampling temperature. Aggregate the rate over a 7 to 30 day window to smooth out short-term volatility.
A URL with 80% citation probability for a target prompt on Perplexity is reliably the answer source. A URL with 10% probability is occasionally cited and represents an optimization opportunity.
5.2
Average sources cited per Perplexity response
Industry analysis, 2026
1.2
Average sources cited per ChatGPT response
Industry analysis, 2026
40%
Higher reappearance rate for content earning both a mention and a citation
AirOps research
Why it matters
Share of model tells you how visible your brand is overall. Citation probability tells you which specific pages are doing the work — and which are not. It is the diagnostic metric for content optimization.
When you publish a new piece of content targeting a high-value prompt, citation probability is the metric that tells you whether the content is earning the citation share you intended. When citation probability decays over time — as fresher competitor content displaces yours — it is the early warning signal that triggers a content refresh.
How to improve it
The factors that move citation probability most reliably:
-
Lead with a definitive answer. AI systems extract content from the first 200 words of a page far more often than from later sections. The opening paragraph should answer the target prompt directly and completely.
-
Add factual density. Sourced statistics, dated quotes from named experts, and concrete examples raise citation probability across studies. Content with original data is cited at notably higher rates than content with general claims.
-
Match the query phrasing. Use the exact words and natural phrasings users employ in the target prompt. Semantic matching tolerates synonyms but rewards exact matches.
-
Improve source authority. Citations from authoritative sites, consistent brand mentions across the open web, and clear authorship with author bios all improve how the model judges the source.
-
Update on a schedule. AI systems weight recency heavily, especially for time-sensitive topics. A page updated in 2026 will typically out-cite a 2024 page on the same topic, even when the older page has more backlinks.
Frequently asked questions
How is citation probability different from share of model?
Share of model is a brand-level metric across an entire prompt set. Citation probability is a URL-level metric for a specific prompt. Share of model tells you "how visible is our brand?"; citation probability tells you "how strong is this specific page for this specific question?" Both are useful — share of model is the executive metric, citation probability is the editor's metric.
How many times do I need to run a prompt to measure citation probability?
At minimum 5 to 10 runs per prompt per platform. AI responses are stochastic — the same prompt produces slightly different responses across runs because of sampling temperature. Single-run measurements are unreliable. Most production tracking systems run target prompts daily and aggregate over 7 to 30 day windows.
Does citation probability vary across AI platforms?
Significantly. Perplexity averages around 5 sources per response while ChatGPT typically cites 1 or 2. The same URL can have 80% citation probability on Perplexity and 5% on ChatGPT for the same prompt. Brands need per-platform measurement and per-platform optimization strategies.
Can citation probability decline over time?
Yes — and the decline is often the first signal that a piece of content needs to be refreshed. As newer competitor content is published, fresher pages displace older ones in the AI candidate pool. Tracking citation probability over time turns content decay from an invisible problem into a measurable one.
Share of model
Share of model is the percentage of relevant AI-generated answers in which your brand appears, measured across a defined set of prompts and platforms. It is the AI-search equivalent of share of voice and the headline metric for tracking GEO performance.
AI visibility score
The AI visibility score is a single composite number — typically on a 0–100 scale — that summarizes a brand's standing across AI assistants (ChatGPT, Claude, Gemini, Perplexity, Grok, AI Overviews) by blending mention frequency, citation rate, ranking position, sentiment, and AI-referred traffic. It is the executive-friendly headline metric for Generative Engine Optimization (GEO) programs.
Generative engine optimization (GEO)
Generative engine optimization (GEO) is the practice of structuring content and brand presence so that AI systems like ChatGPT, Claude, Perplexity, and Google AI Overviews cite, quote, or recommend it when generating answers. Unlike traditional SEO, which competes for ranked positions in a list of links, GEO competes for inclusion inside the answer itself.
Content freshness
Content freshness is how recently a page was published or substantively updated, as signaled to AI assistants and search engines through `dateModified`, visible publish dates, and changed body content. Retrieval-grounded AI engines — Perplexity, Google AI Overviews, Bing Chat, Gemini — weight freshness heavily when choosing citation sources for time-sensitive queries.