The crawler that doesn't ask permission

If you run a website and check your server logs, you've probably noticed something shifting in the past few months. A crawler that isn't Googlebot has been knocking on your door more often, more quickly, and more persistently. And it's not some obscure bot scraping prices or copying content. It's OpenAI's official ChatGPT crawler.

A recent study published on Search Engine Journal, based on analysis of 24.4 million proxy requests, confirms what many of us suspected: ChatGPT-User generated 133,361 requests compared to just 37,426 from Googlebot. That's 3.6 times the volume. This isn't a statistical blip. It's a structural trend.

At difrnt., we've been monitoring AI crawl patterns on client sites since Q3 2025. What we see in those logs matches the study's findings. But the real question isn't about volume. It's about what you do with this information, because how you handle AI crawlers today will determine how visible you are in their results tomorrow.

Two OpenAI crawlers, two completely different jobs

The first thing you need to understand: OpenAI doesn't have one crawler. It has two, and they serve fundamentally different purposes. Confusing the two is one of the most common mistakes we see in technical audits.

ChatGPT-User is the crawler that fetches pages in real time, whenever a user asks ChatGPT a question and the system needs current information. Think of it as a visitor clicking a link, except it does this millions of times per day. This crawler generates real referral traffic that shows up in Google Analytics 4 as a visit source.

GPTBot is the crawler that collects data for model training. It's the one most sites instinctively blocked via robots.txt the moment it appeared. The problem? Many blocked ChatGPT-User at the same time, not realizing they are completely different crawlers with different user agents.

Here's the irony: you block GPTBot to prevent your content from being used for training, but you also block ChatGPT-User, which could have been sending real traffic to your site. It's like locking your store to prevent theft, but forgetting that customers use the same door.

We've seen Romanian e-commerce sites that completely blocked all OpenAI bots with a single robots.txt line: User-agent: *GPT*. The result? Zero presence in ChatGPT answers, while their competitors show up consistently.

Why Googlebot is falling behind (and why that's not necessarily a problem)

The study data shows Googlebot has a 96.3% success rate on requests, compared to 99.99% for ChatGPT-User. The difference comes from Googlebot maintaining a massive index of old URLs, redirects, and pages that haven't existed for years. ChatGPT's crawler, having no historical index, only accesses what's relevant right now.

The speed differences are also striking: 11 milliseconds per request for ChatGPT-User versus 84 milliseconds for Googlebot. Nearly 8 times faster. This means that in an equivalent time window, ChatGPT's crawler can process exponentially more pages.

This doesn't mean Googlebot is becoming irrelevant. Google remains the primary source of organic traffic for most websites worldwide. But the era of AI agents brings new content consumers that don't browse like humans. They ask, receive, and leave. And if your content isn't there when they ask, they simply cite someone else.

According to Cloudflare, ChatGPT-User requests surged 2,825% year-over-year, and total AI bot crawling increased 15x throughout 2025. These numbers are too large to ignore, regardless of your business size.

What you can do right now

This isn't about choosing between Google and AI. It's about being ready for both. Here's what we recommend to our clients and implement across our portfolio projects:

Review your robots.txt line by line. Make sure you're not blocking ChatGPT-User if you want to appear in ChatGPT answers. Block GPTBot separately if you don't want your data used for training. They're different user agents and should be treated as such. A well-configured robots.txt looks something like: allow ChatGPT-User, disallow GPTBot, allow OAI-SearchBot.

Monitor AI crawls from your logs. Check your server logs and identify the actual volume of AI bot requests. If you don't have access to raw logs, tools like Cloudflare Analytics or even Vercel Analytics can give you a clear picture. Set up alerts for unexpected spikes. Some of our clients discovered that AI bots were consuming 40% of their bandwidth without them knowing.

Optimize for answers, not just rankings. AI crawlers look for content that directly answers specific questions. Structure your pages with clear data, AI-ready architecture, and semantic markup. FAQs, comparison tables, and paragraphs that provide concrete answers have the highest chance of being cited in AI responses.

Understand how your content ends up in AI. It's not just about crawling. It's about what you do with your content so it gets selected, cited, and properly attributed by AI engines. Structured data, clear authorship, updated dates, cited sources.

Test your AI visibility. Try searching for your brand or key product in ChatGPT with browsing enabled. If your competitors appear and you don't, that's your signal. The gap between being crawlable and being invisible is often just a few lines in robots.txt.

Rethink crawl budget from an AI perspective. If ChatGPT-User makes 3.6 times more requests than Googlebot, your server needs to be ready for that. Check your response times under load, make sure your CDN is working properly, and ensure your infrastructure doesn't become a bottleneck right when an AI engine wants to index your content.

The bigger picture

The shift from a Googlebot-dominated web to one where AI crawlers are the primary players isn't a one-time event. It's a process that's visibly accelerating, and 2026 data confirms what we've been anticipating for a year: the web is increasingly consumed through AI intermediaries, not through traditional browsers.

Sites that treat AI bots as an inconvenience will lose visibility in the places where more and more people search for information. And in marketing, the visibility you don't have is the client you don't win.

At difrnt., our recommendation is simple: don't block what you don't understand and don't ignore what you don't measure. Check your robots.txt, monitor your logs, and adapt to the new reality of crawling. With data, not instinct.