Should You Block AI Crawlers?

For most established businesses, the answer is no. AI crawlers like GPTBot, ClaudeBot, PerplexityBot, and Google-Extended are how your business gets cited in the AI search responses your buyers are increasingly using during early-stage research. Blocking them protects content from being summarized but also makes the business invisible to a meaningful and growing slice of the discovery surface. The cases where blocking makes sense are real but narrow, and they apply to fewer businesses than the current panic suggests.

The decision is not “should I protect my content.” It is “what is the cost of being invisible to AI systems, and is my content sensitive enough to justify that cost.”

What blocking actually does

A robots.txt directive that blocks an AI crawler tells that crawler not to access your content. The crawler does not train on your content. The associated AI system does not cite your content directly in responses. Your business is effectively absent from the answers that system generates on topics your business covers.

Blocking does not prevent an AI system from referencing your business through other means. If your business is named in third-party content the AI system has access to, that content can still inform its responses. What blocking prevents is your own content from being the source the AI system draws on.

The case for blocking

Three legitimate reasons to block AI crawlers exist. Each applies to a specific kind of business.

Reason one: the content is the product. A subscription content platform, a paid research firm, a publisher selling reports, or a specialty database has content as the core asset. If that content is freely summarized by AI systems, the AI system becomes a substitute for the paid product. Blocking is the right call. Content businesses should treat AI crawlers like any other unauthorized redistributor.

Reason two: the content is confidential or sensitive. Legal opinions specific to identified clients, financial advice tied to specific accounts, and similar confidential material should not be on a public website at all, but where it is, blocking AI crawler access reduces the risk of that material appearing in AI-generated responses.

Reason three: the content has accuracy or recency requirements that AI summarization will compromise. Regulatory filings, current pricing, time-sensitive legal positions, and similar material where yesterday’s version and today’s version diverge meaningfully can be hurt by AI summarization drawing on a previous crawl rather than the current page.

If your business does not fit one of these three patterns, blocking is probably the wrong decision.

The cost of blocking

The cost is invisibility to AI search, and for most established businesses with longer evaluation cycles, that cost is real.

Procurement teams, technical leads, and senior buyers increasingly use AI assistants during early-stage research, before they know which firms to evaluate. ChatGPT, Claude, and Perplexity all generate responses citing specific firms for specific kinds of work. If your business is blocked from those responses, the firms that are not blocked appear in the conversation while your business does not. By the time the prospect arrives at a list of vendors to evaluate, your business is already absent from it.

This is a discovery-stage cost. It does not show up in your analytics because the prospect never reached your site in the first place. The cost is silent. Competitors who appear in AI responses gain ground over time, and the businesses that block AI crawlers lose ground over time, without either side seeing the trade-off clearly.

The specific crawlers worth knowing

Once you understand the cost, the individual crawlers become relevant. These are the four that matter most for business-context AI search.

  • GPTBot is the crawler OpenAI uses for training data and for some forms of live retrieval.
  • ClaudeBot is the crawler Anthropic uses for similar purposes for Claude.
  • PerplexityBot is the crawler Perplexity uses to retrieve information for its real-time search responses.
  • Google-Extended is Google’s specific designation for AI training access, distinct from Googlebot. Blocking Google-Extended does not block Googlebot or remove your site from traditional Google search.

There are others, but the decision framework is the same for all of them.

The middle path most agencies miss

The conversation is usually framed as a binary: block or allow. The actual framework is more granular, and this is where most of the value lives.

Different content surfaces on your site can have different policies. Your public-facing service descriptions, your educational content, and your published methodologies probably benefit from AI crawler access. Your client login areas, your gated downloads, and any specific commercial materials probably do not. A well-configured robots.txt can allow AI crawlers into the content you want surfaced while blocking them from the content you want protected.

The same logic extends to specific crawlers. You can allow some AI systems and block others if the use cases differ. Most businesses do not need that level of granularity, but the option exists and is worth knowing about.

The robots.txt directives are straightforward. To block GPTBot entirely:

User-agent: GPTBot
Disallow: /

To allow GPTBot but block it from a specific path:

User-agent: GPTBot
Disallow: /private/

To block multiple AI crawlers from a specific path:

User-agent: GPTBot
User-agent: ClaudeBot
User-agent: PerplexityBot
User-agent: Google-Extended
Disallow: /private/

The technical implementation is the easy part. The decision about what to allow and what to block is the part that requires actual thinking about the business.

What I tell clients

The default recommendation is to allow AI crawlers across public-facing content. The exceptions are the three cases above: content businesses, confidential material, and content with accuracy or recency requirements that AI summarization will compromise. Outside those cases, the cost of being invisible to AI search exceeds the cost of being summarized in AI responses.

The framing that helps most is the discovery-stage frame. AI crawlers are how prospects find businesses they did not previously know to look for. Blocking that channel is a deliberate decision to opt out of one of the largest emerging discovery surfaces. Some businesses have good reasons to do that. Most do not. If you are still weighing it, the three reasons above are the right test. If none of them apply, allow the crawlers and focus the energy on what you do with that access.

Where this goes next

If you have decided to allow AI crawlers, the next question is how you structure content so those crawlers actually surface your business in the right conversations. That is what I cover in AEO vs GEO: What’s the Difference and Why It Matters.

Once the content work is underway, the question becomes how to measure whether it is producing visibility, citations, and pipeline. I cover that in How to Measure AI Search Visibility.

For an overview of how I approach this work with clients, see the AI Search Visibility service page.