Indexly
Indexly Insights · Jun 2026

Cloudflare is quietly locking 1-in-5 sites out of AI engines

On July 1, 2025 — what Cloudflare called “Content Independence Day” — every new Cloudflare domain started blocking ChatGPT, Claude, Perplexity, Gemini and Grok by default. Cloudflare sits in front of 22.4% of the web. Most owners never opted in. Most don't know they're invisible.

22.4%

of the public web fronted by Cloudflare

Jul 1, 2025

default AI block flipped on

1M+

Cloudflare customers blocking AI

0.27%

of search referrals are from AI today

What's actually blocked

Six AI crawlers, blocked by default — on every new Cloudflare domain

These are the verified bots that fetch your pages to power AI answers. If they can't reach you, you can't be cited in their responses — no matter how good your content is.

GPTBot

ChatGPT (OpenAI)

Blocked

ClaudeBot

Claude (Anthropic)

Blocked

PerplexityBot

Perplexity

Blocked

Google-Extended

Gemini & AI Overviews

Blocked

GeminiBot

Gemini

Blocked
Grok

Grok / xAI bot

Grok (xAI)

Blocked
The catch: before July 1, 2025 these bots could fetch any Cloudflare site unless the owner explicitly blocked them. After that date the default reversed — and most site owners don't realize the switch flipped.

Headline findings

Six things the data tells you

22.4% of the web is behind Cloudflare

Per W3Techs (Jun 2026). That's the addressable surface area of the default-block — and it scales every time a new domain onboards.

1M+ customers blocking AI

Cloudflare confirms over one million customers have enabled the AI blocking option since the July 2025 rollout — most of them inheriting it by default.

GPTBot's reach is collapsing

Share of Cloudflare-fronted sites GPTBot can crawl dropped from 35.46% in Jan 2026 to 28.97% by March. That's a one-quarter slide in a single quarter.

ClaudeBot crawls 23,951:1

Anthropic's ClaudeBot fetches ~24k pages for every one human visitor it sends back. That asymmetry is fueling the blocking backlash — and the collateral GEO damage.

Perplexity sends visitors

At 95:1, Perplexity is two orders of magnitude friendlier than ClaudeBot. Blocking it indiscriminately means losing one of the few AI engines that actually drives clicks.

80% of AI crawl is training, not search

Cloudflare classifies ~80% of AI bot traffic as training data fetches and ~18% as search indexing. Blocking the bots blocks both — including the ones that decide whether you're cited tomorrow.

The crawl-to-refer gap

AI bots fetch a lot. They send back very little.

Cloudflare Radar (Q1 2026) measured how many pages each bot crawls per single human visitor it sends back. The gap explains the backlash — and the default-block that's now hurting the brands who didn't cause it.

Crawl-to-refer ratio · pages crawled per 1 referral

Source: Cloudflare Radar, January–March 2026

Anthropic ClaudeBot
23,951 : 1

Highest crawl burden per referral

OpenAI GPTBot
1,252 : 1

Crawl-heavy, low referral return

PerplexityBot
95 : 1

Citation-first, sends visitors

Googlebot (classic)
5 : 1

Reference baseline

IndexlyIndexly

Anthropic ratios as high as 70,900:1 were observed in mid-2025. Mistral was the only major bot near parity at 0.1:1. DuckDuckGo's 1.5:1 is the closest thing to a fair trade in the dataset.

The robots.txt trend

The block list is hardening — not loosening

Cloudflare-network robots.txt analysis (Q1 2026) shows GPTBot losing crawl reach and ClaudeBot picking up disallow share. The trend is toward fully-blocked rather than partially-allowed.

GPTBot

January 2026

35.46%

March 2026

28.97%

−6.49 pts

Share of Cloudflare-fronted sites GPTBot can still crawl

ClaudeBot

January 2026

9.6%

March 2026

10.1%

+0.5 pts

Share of robots.txt DISALLOW rules naming this bot

IndexlyIndexly

TechnologyChecker also reports the share of “partially disallowed” permissions falling in favor of “fully disallowed” across GPTBot, CCBot and Google-Extended — sites are choosing all-or-nothing.

The hidden cost

AI referrals are tiny today — but they're the curve worth owning

Google still sends 87.52% of all search referrals (28-day window ending April 20, 2026). Every AI chatbot combined sends 0.27%. That sounds like nothing — until you factor in the compounding growth and the higher conversion intent of AI-referred visitors. Blocking those bots blocks the only channel currently scaling.

Share of search referrals (Apr 2026)

Google Search (traditional)87.52%
All AI chatbots combined0.27%

Visualized at min-width so the bar is visible — the actual share is 0.27%, growing every quarter.

IndexlyIndexly

ChatGPT, Gemini, Claude and Perplexity put together still sit under half a percent of total referrals — but they grew the fastest of any channel over the past 12 months. Site owners blocking by default are cutting off the next era's top-of-funnel.

What to do about it

How to let AI engines back in — without opening the floodgates

  1. 1

    Audit your current state

    Check three places: Cloudflare dashboard → Bots → AI Crawl Control, your robots.txt for explicit Disallow rules against GPTBot / ClaudeBot / PerplexityBot / Google-Extended, and your server logs for blocked AI user-agents.

  2. 2

    Disable the blanket AI-bot block in Cloudflare

    In Cloudflare AI Crawl Control, turn off the 'Block AI bots' managed rule. Whitelist the specific verified bots you want — GPTBot, ClaudeBot, PerplexityBot, Google-Extended, GeminiBot.

  3. 3

    Declare access explicitly with robots.txt and llms.txt

    Allow the AI bots in robots.txt (`User-agent: GPTBot` → `Allow: /`) and add an llms.txt at root that lists the URLs you want AI to index. Explicit beats default — for both you and the engine.

  4. 4

    Keep Bot Fight Mode tuned, not off

    You still want protection from scrapers and credential-stuffers. Bot Fight Mode + verified-bot allowances is the right balance — not a blanket off-switch on either side.

  5. 5

    Monitor citations to confirm the fix worked

    Watch your AI citation share weekly. If you were blocked, you'll see citation appearances climb 2–6 weeks after re-allowing crawl. Indexly's AI Citation Tracker shows the lift directly.

The default-block isn't malice — Cloudflare is reacting to a real crawl-to-refer asymmetry. But for brands trying to be cited by AI, inheriting that default silently is the worst possible outcome. Audit it. Fix it. Then track the lift.

FAQ

Questions teams ask about Cloudflare and AI visibility

Is Cloudflare really blocking AI crawlers by default?

Yes. On July 1, 2025 — what Cloudflare called 'Content Independence Day' — every newly onboarded Cloudflare domain began blocking all known AI crawlers (GPTBot, ClaudeBot, PerplexityBot, Google-Extended and others) by default. Before that date, AI crawlers could fetch any Cloudflare-protected site unless the owner explicitly blocked them. The default flipped.

How many sites does this affect?

Cloudflare sits in front of 22.4% of all websites on the public web as of Jun 2026 (per W3Techs). Cloudflare's own announcements confirm over one million customers have enabled the AI blocking option since the rollout. That's roughly one in five sites globally — most of them unknowingly opted in by virtue of being new Cloudflare customers.

Why is this such a big deal for AI Search Visibility?

If GPTBot, ClaudeBot, PerplexityBot and Google-Extended can't reach your pages, you can't show up in ChatGPT, Claude, Perplexity, Gemini, Grok or Google AI Overviews. Buyers running their early research inside AI engines won't find you — even if you rank #1 organically on Google. AI visibility starts at crawler access; everything else is downstream of that.

What is the crawl-to-refer ratio and why does it matter?

Crawl-to-refer is the ratio of pages an AI bot fetches versus how many human visitors it sends back. Cloudflare Radar data (Q1 2026) shows ClaudeBot crawls 23,951 pages per 1 referral, OpenAI/GPTBot 1,252:1, Perplexity 95:1, and traditional Googlebot 5:1. The blocking debate is partly a backlash to that imbalance — sites pay the bandwidth cost without seeing visitor returns.

Which AI bots are most often blocked via robots.txt?

GPTBot is the most blocked AI crawler in robots.txt directives across Cloudflare's network, though its share of crawled sites has dropped from 35.46% to 28.97% as more sites add disallow rules. ClaudeBot blocking grew fastest in Q1 2026 — its share of DISALLOW rules rose from 9.6% in January to 10.1% by March.

How do I check if my site is blocking AI crawlers?

Three places to check: your Cloudflare dashboard under Bots → AI Crawl Control, your robots.txt for explicit Disallow rules against GPTBot / ClaudeBot / PerplexityBot / Google-Extended / Anthropic-ai / cohere-ai / Bytespider, and your server logs for blocked user-agents. Indexly's AI Readiness Audit checks all three automatically.

How do I let AI engines back in without opening to malicious bots?

Disable the 'Block AI bots' managed rule in Cloudflare AI Crawl Control, allow the specific verified bot user-agents you want, and use robots.txt and an llms.txt file to declare your access policy explicitly. Keep Bot Fight Mode tuned to verified-bot allowances rather than blunt blocks. This restores AI access without giving up real abuse protection.

What about pay-per-crawl?

Cloudflare's new pay-per-crawl lets publishers charge AI companies a per-request fee using HTTP 402 Payment Required. It's an alternative to outright blocking, but adoption is still early and most AI engines don't pay yet. For brands focused on visibility (not licensing revenue), you'll still want to allow the verified bots that drive AI citations.

Find out if Cloudflare is blocking AI from reading your site

Indexly's AI Readiness Audit checks your robots.txt, llms.txt, Cloudflare rules and live crawler accessibility for GPTBot, ClaudeBot, PerplexityBot, Google-Extended and Grok — in minutes.

Want more like this? See all Indexly Insights.