Blog

What 30 Days of AI Bot Traffic on Two Real Domains Actually Looks Like

Four patterns from 30 days of AI bot traffic on a consumer content site and a B2B SaaS site — patterns that flat AI-traffic dashboards hide.

Most "AI traffic" dashboards show a single number: bot %, AI share, citation count. That number is almost always wrong in interesting ways. It hides which AI is crawling, why, and whether it matters.

We pulled 30 days of bot traffic from two domains we monitor — one consumer-facing (content-heavy), one B2B SaaS (technical AI tooling). The shapes were so different that the same dashboard reading would mislead both owners.

Here is what we found.

1. Claude and ChatGPT have opposite audiences

Across the two sites, the dominant LLM crawler flipped:

Site archetypeChatGPT crawlsClaude crawlsDominant
Consumer content site1,453785ChatGPT, 1.85×
B2B SaaS technical site7041,056Claude, 1.50×

It is a ~3× swing in relative preference depending on what the site is about.

Audience-tool fit is real. Claude users research technical and AI topics with Claude. ChatGPT users research everything else with ChatGPT. If you have been optimizing for "AI search" without specifying which AI, you have been choosing your audience by accident. This is one reason the layered AI visibility framework treats "crawled" as one step in a chain, not a single metric — each platform sits on its own retrieval path.

2. ByteDance is masquerading as "AI traffic"

On the consumer site, 65% of all "AI bot" hits were ByteDance + ByteSpider — Chinese content scrapers that do not power any user-facing LLM product most Western buyers care about.

Strip them out and the consumer site's AI bot traffic drops 3×. The "this domain has 6× more AI interest than that one" headline becomes "they are roughly comparable, just one is being scraped for training data."

Most off-the-shelf AI analytics tools count ByteDance as AI traffic. That is technically correct (it is an AI scraper) and practically misleading (it will not drive a citation or a referral). Browser analytics like GA4 cannot help here either — most crawler requests never execute JavaScript, so the bot mix only shows up in server-side logs.

3. Bot politeness varies 4×

Share of bot hits that fetched robots.txt before crawling other pages:

  • B2B SaaS site: 68%
  • Consumer site: 15%

This is a clean proxy for "did the bot bother to check the rules." Major LLM crawlers (OpenAI, Anthropic, Perplexity, Google-Extended) check robots.txt routinely. Aggressive scrapers do not.

Implication: robots.txt is a filter on polite bots, not a defense against scraping. If you are blocking GPTBot in robots.txt to stop training, you are stopping the bot that was going to behave anyway. The ones actually grabbing your content are not reading your rules file.

4. Image scraping is concentrated where the images are

  • Consumer site (blog with image library): 4,666 image crawls, 37% of bot traffic
  • B2B SaaS site (text-heavy): 4 image crawls, 0.2%

If your site has a media library, it is being scraped to train image models — and that is a completely separate scraping pipeline from the text-citation one most "AI SEO" tools track.

What this means for AI visibility tracking

The default "AI bot %" number is too aggregated to act on. To make it useful you need to split by:

  • Platform — Claude / ChatGPT / Perplexity vs. ByteDance / CommonCrawl
  • Intent — citation-driving LLM crawl vs. background training scrape
  • Content type — text/HTML vs. image library
  • Polite vs. aggressive — robots.txt-respecting vs. not

Without those splits, two sites with identical "10% AI traffic" numbers can have completely different realities, and the same intervention will help one and waste effort on the other. The next failure mode after "crawled" is reuse — see the new SEO problem: crawled, but not cited for what happens when the bot does show up and still does not cite the page.

See your own mix

We built SeeLLM to do this split automatically. Run a free 60-second scan at seellm.com/score to see your domain's AI bot mix, top pages, and where you stand on each of the four axes above.

Continue reading

More from the field notes

All posts

From reading to action

See which pages AI systems can actually use.

Start with the free AI Visibility Score. When you need page-level evidence, move from static checks to monitoring the pages that matter.