AI & LLMsUpdated June 1, 2026

BM25

Definition

BM25 (Okapi BM25) is a classic keyword-based ranking algorithm that scores how well a document matches a query's terms. It weighs term frequency, rarity, and document length to rank results. Despite being decades old, BM25 remains a core candidate generator in modern AI retrieval pipelines, often paired with vector search.

How it works

BM25 scores a document against a query using three intuitions. First, term frequency: a document mentioning a query word more often is more relevant — but with diminishing returns, so the tenth occurrence adds less than the second. Second, inverse document frequency: rare words are more discriminating than common ones, so matching an uncommon term counts for more. Third, length normalization: longer documents are penalized so they don't win simply by containing more words.

These factors combine into a single relevance score for each document, and results are ranked highest-score-first. BM25 is purely lexical — it matches the literal tokens in the query, with no understanding of synonyms or meaning.

Its strengths are speed, transparency, and strong performance on keyword-heavy and exact-match queries like product names, error codes, and proper nouns.

Why it matters for AI search

Modern AI retrieval rarely relies on a single method. BM25 is the workhorse lexical retriever that runs alongside semantic vector search in hybrid pipelines. BM25 nails exact terms and rare keywords that embeddings can blur; vector search catches paraphrases and meaning that keywords miss. Together they generate a stronger candidate set for a reranker to refine before passages reach the language model.

For content owners, BM25's persistence is a reminder that exact-match signals still matter. Using the precise terminology, entity names, and phrasing your audience searches for keeps content retrievable by the lexical half of the pipeline that feeds AI answers and citations.

Frequently asked questions

Is BM25 outdated now that we have vector search?

No. BM25 is still widely used because it excels at exact-term and rare keyword matching where embeddings can underperform. Most production AI retrieval systems run BM25 and vector search together in a hybrid setup rather than replacing one with the other.

What is the difference between BM25 and TF-IDF?

Both reward rare matching terms, but BM25 adds saturation on term frequency (diminishing returns) and explicit document-length normalization, which makes it more robust than plain TF-IDF on documents of varying length.

Why pair BM25 with vector search?

They have complementary failure modes. BM25 matches literal tokens and handles exact names and codes well; vector search matches semantic meaning and paraphrases. Combining them in hybrid search improves recall and the quality of candidates passed to a reranker.

How does BM25 affect whether my content gets cited?

In retrieval pipelines that use lexical scoring, BM25 helps decide whether your page becomes a candidate at all. Using the exact terms, entity names, and phrasing your audience uses makes content easier for the lexical retriever to surface and pass downstream.

Hybrid search

Hybrid search combines keyword (lexical) retrieval and vector (semantic) retrieval so an AI system matches both exact terms and underlying meaning. By blending methods like BM25 with embedding similarity, it improves recall and precision over either approach alone, producing better candidate passages for grounding and citation in AI answers.

Vector search

Vector search is a retrieval method that finds information by comparing numerical meaning representations called embeddings, rather than matching exact keywords. Queries and documents are converted to vectors, and the system returns items whose vectors are closest in space — surfacing semantically relevant results even when the wording differs.

Reranking

Reranking is a second-stage retrieval step that reorders an initial set of candidate documents by deeper relevance to the query. After a fast first-stage retriever returns many candidates, a more powerful (often cross-encoder) model scores each query-document pair, surfacing the best passages to feed a language model for grounded, accurate answers.

Retrieval-augmented generation (RAG)

Retrieval-augmented generation (RAG) is an AI architecture that gives a large language model real-time access to external documents at query time — retrieving relevant passages from a vector database or search index and inserting them into the model's context before it generates a response. RAG is the foundation of modern AI search and the most effective technique for reducing hallucination.

Embeddings

Embeddings are numerical vector representations of text, images, or other data that capture semantic meaning. By mapping content into a high- dimensional space where similar items sit close together, embeddings let AI systems compare meaning mathematically — powering similarity search, retrieval, clustering, and recommendation.

AI indexing

AI indexing is the process by which AI assistants — ChatGPT, Claude, Gemini, Perplexity, Grok, and Google AI Overviews — crawl, parse, embed, and store web content so it can be retrieved and cited at inference time. It is the AI-search counterpart to Google's traditional index, and the gateway any page must pass through to be eligible for citation.