Cloudflare is updating its method for identifying and blocking AI crawlers, which may result in Googlebot being blocked on sites that prevent AI training. The company announced the update as part of its second Content Independence Day.
The new controls allow websites to manage automated traffic based on three behaviors, rather than just using a single “Block AI bots” toggle. They are now available to all customers, including the free version. A separate set of standard changes will take effect on September 15.
Three ways to sort AI crawlers
Cloudflare now sorts crawlers based on what they do on a website, rather than whether they are considered “AI.” The company divides AI use cases into three categories:
- Search indexes a website to answer questions later, and Cloudflare ties that behavior to referral traffic.
- Agent, real-time bots acting on behalf of a person, like ChatGPT users, or browser agents like Gemini or Claude running Chrome.
- Training, crawling that retrieves content to train or optimize a model.
Cloudflare says bot operators should run separate crawlers for each behavior so that sites can see why a bot is visiting them and decide whether to allow or block it.
What changes on September 15th
Two standard changes will come into effect on September 15th. For new customers and new sites for existing customers, training and agent crawlers are blocked by default on pages that display ads while still allowing search. Cloudflare’s press release also states that existing free customers who have not changed their settings by September 15th will be transitioned to these default settings.
The second change goes even further. Cloudflare will begin treating general-purpose crawlers based on their overall behavior, applying the strictest rule in effect. For example, a crawler that performs both search and training will be blocked if a site blocks training. Cloudflare uses Googlebot, Applebot, and Bingbot as examples because each crawls for search and AI training. If a site already has the older “Block AI bots” setting enabled, this new rule applies to it.
If you would like to keep these crawlers, you can review or change these settings in your Cloudflare dashboard at any time before September 15th. Cloudflare says it will continue to notify customers before that date.
New signals for content consumption by bots
Cloudflare is also testing a content usage signal that extends content signals in robots.txt. It has three values, from most to least restrictive: immediate, which stores nothing; Reference that indexes and links back and is the new standard; and complete, which summarizes and reproduces. According to Cloudflare, these indicate a preference and do not block on their own.
The company has revised the definition of “Verified” for bots. Now a verified bot is not automatically approved everywhere; Instead, its access depends on its category. Additionally, bots that reproduce content entirely are excluded from review. Cloudflare has introduced a searchable directory, BotBase, for Enterprise Bot Management users that displays the classification of each tracked bot and a copyable security rule identification ID.
The report behind the changes
The update came with a Cloudflare report to mark the one-year anniversary of the first Content Independence Day. The report says AI training now accounts for the majority of crawler requests on its network, up from about 20% in spring 2025. It also notes that daily requests from AI agents have increased by more than 1,700% over the year. These statistics are based on Cloudflare’s network traffic and do not represent the entire web.
Why this is important
The September 15 rule links AI training blocks to search crawling on the Cloudflare network. If a website blocks training to protect its content from AI models, it may also inadvertently block Googlebot, as a Cloudflare block operates at the network level, making it harder to bypass than a simple robots.txt line, which Google can ignore because a Cloudflare block operates at the network level, as robots.txt is an advisory instruction to crawlers. Losing Googlebot access means the site will no longer be crawled as effectively, which could ultimately affect its visibility in search results.
I’ve seen publishers move to default deny setups, blocking both retrieval and training Bots over the past year. The exposure is the same every time. Blocking the training layer can also block the search layer that keeps a website discoverable.
Looking ahead
Websites using Cloudflare should review their AI blocking settings by September 15 and decide whether to keep search crawlers enabled. The combined crawler rule primarily affects those who previously enabled “Block AI bots” and have not adjusted their settings since then. Free users who do not change their settings will update them to the new default settings on this date.
Cloudflare wants mixed-purpose crawler operators to separate these bots based on their behavior in the coming year. Whether major operators differentiate their bots based on behavior will determine whether this becomes a real choice rather than a trade-off between blocking AI training and maintaining search visibility.
Featured Image: jackpress/Shutterstock
Follow us on Facebook | Twitter | YouTube
WPAP (907)