If you run a website, you have visitors you never invited. AI bot traffic has grown over 300% year over year, and by late 2025, roughly 1 in every 31 web visits came from an AI bot, according to TollBit network data. The problem is straightforward: most of these bots give you nothing in return.

We are not talking about Googlebot, which at least indexes your content and makes you visible in search. We are talking about crawlers from OpenAI, Meta, Anthropic, and dozens of other companies scanning your site to train their language models. A recent report on Search Engine Journal, based on data from Kinsta and Cloudflare, shows that 80% of AI crawling activity is dedicated to model training. Not your visibility. Their training.

And you are paying the hosting bill.

It is not about scraping. It is about your invoice.

The conversation around AI bots has focused heavily on intellectual property, on the fact that AI models take content without attribution. That is a legitimate concern, but it misses a more pressing and immediate issue: operational cost. Every request a crawler makes on your site consumes resources. And not all requests are equal.

A crawler hitting your cart pages, checkout flow, or internal search is not making a simple static page request. Those pages bypass caching, trigger PHP execution, database queries, session handling, and memory allocation. Your server does real work for a visitor that will never buy anything. On an e-commerce site with tens of thousands of products, a single crawler going through the full catalog can generate hundreds of thousands of requests per day.

Cloudflare's David Belson puts it bluntly: "There's the person who didn't know what the hell they were doing yesterday, but vibe coded a bot today and let it loose. They're not even bothering to check robots.txt." What makes the problem harder is that not all bots identify themselves honestly. Some use generic user-agents or rotate them constantly, making automated detection more difficult.

We recently wrote about how ChatGPT's crawler has outpaced Googlebot in volume on some sites. The trend is accelerating. And hosting and CDN costs hit the operational budget directly, even if nobody explicitly allocates them to "parasitic traffic."

The trap: block and disappear, or pay and stay quiet?

If the solution were simple, we would block everything and move on. But it is not. Some crawlers contribute to your visibility in AI Search. Others may generate citations of your content in ChatGPT, Perplexity, or Google AI Overviews responses. Block everything and you risk becoming invisible in the places where potential customers are actually looking for information.

This tension is real and has no universal solution. A blog that depends on organic traffic faces a different equation than a SaaS that sells through demos. An e-commerce store with thousands of SKUs is in a different situation than a services site with 20 pages.

The right question is not "should I block or not?" It is "which bots, on which sections of my site, under which conditions?" That is a business decision, not a purely technical one.

In practice, this means differentiated access:

  • Googlebot and Bingbot get full access. Indexing and direct search visibility.
  • AI crawlers that may contribute to citations (GPTBot, ClaudeBot) get access to editorial content, but not checkout or internal search pages.
  • Unknown or aggressive crawlers get blocked without hesitation.

This selective approach requires active monitoring. If you do not know who is visiting your site, you cannot make informed decisions. GA4 has started identifying AI traffic as a separate segment, which helps. But server-side tools like Cloudflare and access logs remain essential for the complete picture.

Your metrics are partly lying to you

A side effect many overlook: if 1 in 31 visits is a bot, your traffic numbers are inflated. Not dramatically, but enough to distort decisions. Kinsta's report argues that the most meaningful signals are those tied to actual business outcomes: branded search demand, direct traffic, engagement quality, and revenue.

When you report to a client that site visits grew 15% this month, part of that increase may simply be more bots discovering the site. Raw traffic volume can no longer serve as a reliable proxy for genuine market interest. It is a sensitive point in our industry, where the temptation to report inflated metrics has always existed. The difference now is that it is not about intent but about passive data contamination.

We explored a similar dynamic in our article on the AI Search opt-out: the technical options exist, but the data to make smart decisions is missing. The bot crawling situation is the same. The tools are there, but most businesses are not actively using them.

What you can do right now

If your site gets decent traffic (over 10,000 visits per month), a few hours invested in bot management can make a real difference. This does not require a major project. These are steps your technical team can implement in a day.

Audit your automated traffic. Check your server logs or Cloudflare dashboard. Identify which bots are accessing your site, how often, and most importantly, which pages they are hitting. Look for patterns: bots returning to the same pages hundreds of times per day or accessing URLs with filter parameters are a clear signal of wasted resources.

Protect expensive areas. Use robots.txt or Cloudflare rules to restrict bot access to cart pages, checkout flows, internal search, and filtered product pages with multiple parameters. These URLs consume the most server resources. Protecting them does not affect the indexing of your relevant content.

Differentiate between crawlers. Googlebot delivers direct search visibility. GPTBot may contribute to ChatGPT citations. Meta-ExternalAgent trains models with no direct benefit to you. Each deserves a separate decision based on the value it returns. The answer does not have to be binary.

Recalibrate your reporting. If you present traffic data to stakeholders, make sure your metrics reflect human behavior. Conversions, engagement quality, and direct traffic tell a more honest story than raw session volume. Add a bot traffic filter in your GA4 reports and compare the differences. You might be surprised.

AI bots are not going away. Their volume will keep growing as more companies train and operate AI models. The question is not whether this deserves your attention, but how soon you act. Every month you ignore the problem is a month you are paying for visitors who will never become customers.