AI & LLMsUpdated May 6, 2026

TDM rights reservation

Definition

TDM rights reservation is the use of legal and technical notices to reserve rights against text and data mining by AI systems. Rooted in copyright frameworks such as the EU's text and data mining exception, it lets rights holders signal — machine-readably and in human-readable terms — that their content may not be mined for AI training without permission.

How it works

Text and data mining is the automated analysis of large bodies of content to extract patterns — the kind of processing used to build AI training datasets. Some copyright regimes, notably the EU's, permit TDM by default but allow rights holders to reserve their rights through an express, machine-readable opt-out.

Reservation works on two levels. The legal layer is an explicit statement in terms of service or a rights notice that mining is not permitted. The technical layer makes that reservation machine-readable so crawlers can detect and honor it — through mechanisms such as robots directives, crawler-specific rules, HTTP headers, and emerging standards designed specifically to express TDM preferences in a structured way.

For a reservation to be effective, it generally needs to be both clearly stated and discoverable by automated agents. Patchy or human-only notices are easy for crawlers to miss or ignore.

Why it matters

Rights reservation is the primary tool content owners have to assert control over whether their work feeds AI training. As models consume ever-larger corpora, the question of consent and compensation has become central to copyright debates and litigation.

For publishers and brands, reservation is the defensive counterpart to licensing: reserving rights establishes the baseline of "no mining without permission," which can then be relaxed through a paid licensing deal. It also intersects with AI search strategy — blocking mining can protect content, but may also reduce the chance of being retrieved and cited, so owners weigh control against visibility.

Frequently asked questions

What is text and data mining (TDM)?

TDM is the automated computational analysis of large amounts of text or data to extract information and patterns. In the AI context it underlies the collection and processing of content used to train models.

How do I reserve TDM rights against AI?

Combine a clear legal statement that mining is not permitted with machine-readable signals crawlers can detect — robots directives, crawler-specific rules, HTTP headers, and structured opt-out standards. Reservation is most effective when it is both explicit and discoverable by automated agents.

Is TDM rights reservation legally binding everywhere?

No. Its force depends on jurisdiction. The EU framework explicitly recognizes opt-outs from its TDM exception, while other regions rely on different copyright doctrines such as fair use. Enforcement and recognition therefore vary internationally.

How does rights reservation relate to publisher licensing?

Reservation sets a default of "no mining without permission"; licensing is the agreement that grants permission, usually for payment and attribution. Owners often reserve rights first, then negotiate licensing for the access they choose to allow.

Publisher licensing

Publisher licensing describes the agreements through which AI companies gain permission to access, train on, retrieve, display, or cite content owned by publishers and professional content providers. These deals set the terms — payment, attribution, usage scope, and data access — under which copyrighted material flows into model training and AI answer engines.

AI training data

AI training data is the corpus of text, code, images, and other content used to train large language models. Frontier models like GPT-4o, Claude 4 Sonnet, Gemini 2.5, and Llama 4 are trained on trillions of tokens drawn from web crawls, books, code repositories, and licensed datasets — the composition of which shapes what the model knows, who it cites, and how it represents brands.

AI regulation

AI regulation is the body of laws, executive orders, and enforcement frameworks governing how AI systems are built, trained, deployed, and audited. The 2026 landscape is dominated by the EU AI Act (in active enforcement), the US Executive Order on AI, the UK's pro-innovation framework, and a fast-growing set of state-level laws in California, Colorado, and New York.

OpenAI crawlers

OpenAI crawlers are the automated web agents OpenAI uses to access web content, each with a distinct purpose and user agent. GPTBot collects data that may be used for model training, OAI-SearchBot indexes pages for ChatGPT search, and ChatGPT-User fetches pages in response to a user's live request. Sites can allow or block each independently via robots.txt.

llms.txt

llms.txt is a proposed web standard — a markdown-formatted file placed at the root of a website — that gives LLMs and AI tools a curated index of a site's most important content. Modeled on robots.txt and sitemap.xml but designed for LLM comprehension rather than search crawlers, llms.txt is in the early adoption phase as of 2026, with no major AI platform officially committed to consuming it.

Data privacy in AI

Data privacy in AI covers the practices that protect personal and sensitive information across the AI lifecycle — what enters training data, what is sent through APIs, how enterprise deployments isolate data, and how systems meet regulations like GDPR. It addresses consent, retention, data residency, and whether user inputs are used to further train models.