Methodology

How SeeLLM classifies AI traffic

We separate every request into one of eleven categories using user-agent fingerprints, ASN matching, and request shape. This page documents the rules we apply and the known limitations of each.

Last updated: 2026-05-05

Categories

AI training crawler

A bot that fetches pages to ingest into a model training set or an answer index. Identifies itself in the user-agent.

Signatures

  • GPTBot (OpenAI)
  • ClaudeBot (Anthropic)
  • Bytespider (ByteDance)
  • OAI-SearchBot (OpenAI)
  • Amazonbot (Amazon)
  • Applebot-Extended (Apple)
  • CCBot (Common Crawl)
  • PerplexityBot (Perplexity)
  • Google-Extended (Google)
  • Meta-ExternalAgent (Meta)
  • Cohere-Training-Data-Crawler
  • Diffbot, Firecrawl, HuggingFace-Bot, Webzio-Extended, Omgilibot, PanguBot, ImageSiftBot, Timpibot, Brightbot, AI2Bot

AI assistant

A bot fetching pages on behalf of a live user prompt — the human is waiting for a response. Distinct from training crawlers because the fetch is reactive, not scheduled.

Signatures

  • ChatGPT-User (OpenAI)
  • Claude-User, Claude-SearchBot (Anthropic)
  • Perplexity-User (Perplexity)
  • Gemini-Deep-Research (Google)
  • OpenAI-User, OAI-SearchBot-User

AI referral

A request from a real human browser whose Referer header points at a known AI surface. Counts as a click-through from an AI assistant to the destination site.

Signatures

  • chat.openai.com, chatgpt.com
  • claude.ai
  • perplexity.ai
  • duckduckgo.com (AI mode)
  • search.brave.com (Brave AI)
  • gemini.google.com

AI coding agent

Autonomous coding assistants that fetch documentation, source files, or APIs while generating code.

Signatures

  • Cursor
  • GitHub-Copilot
  • Devin
  • Cody
  • Windsurf
  • Aider

Search engine

Traditional search index crawlers, distinct from AI training crawlers.

Signatures

  • Googlebot
  • Bingbot
  • YandexBot
  • DuckDuckBot
  • Baiduspider
  • SeznamBot
  • Sogou
  • Exabot

SEO tool

Third-party SEO and competitive-intelligence crawlers.

Signatures

  • Ahrefs
  • Semrush
  • MJ12
  • DotBot
  • DataForSEO
  • PetalBot
  • Barkrowler
  • Serpstat
  • Sistrix

Social preview

Bots fetching pages to render link previews in social, messaging, and chat apps.

Signatures

  • Twitterbot
  • LinkedInBot
  • Slackbot
  • Discordbot
  • WhatsApp
  • TelegramBot
  • facebookexternalhit
  • SkypeUriPreview
  • Redditbot

Monitoring

Synthetic monitoring and uptime checks.

Signatures

  • UptimeRobot
  • Pingdom
  • StatusCake
  • HyperPing
  • Datadog Synthetic
  • New Relic

HTTP client

Generic HTTP libraries with no AI, search, or browser identity. Often scripts or integrations.

Signatures

  • curl
  • wget
  • python-requests
  • aiohttp
  • axios
  • node-fetch
  • Go-http-client
  • OkHttp
  • Java/
  • libwww

Scanner

Security probes, vulnerability scanners, and traffic with empty or zero-length user-agent strings. Almost never legitimate site visitors.

Signatures

  • SecurityScanner
  • InternetMeasurement
  • masscan, nmap, zgrab, nuclei
  • Empty user-agent (UA length = 0)

Browser

A real human browser session. Falls through when no other category matches and the request is human-shaped.

Signatures

  • Chrome, Safari, Firefox, Edge, mobile browsers

Known limitations

Referer stripping undercounts AI referrals

Mobile ChatGPT, in-app browsers, Arc, Brave, and several AI products strip the Referer header on outbound clicks. AI referral counts are conservative — they reflect what we can see, not the full traffic. Treat referral rankings between AI sources as directional.

User-agent matching is the primary signal

We rely on declared user-agents for bot classification. Some scrapers and small AI projects use Python-requests or browser-shaped UAs without identifying themselves. Those land in HTTP client or Browser, not AI training. ASN matching is used as a secondary signal where available.

AI assistant vs. AI training is a fuzzy line

Some bots (e.g., Perplexity) operate in both modes — training crawls and live-user fetches — sometimes with the same UA. We classify by UA token where vendors differentiate (GPTBot vs. ChatGPT-User), and conservatively otherwise.

Self-identifying is not the same as truthful

Any client can claim to be GPTBot. We do not verify ownership for every request. For high-stakes use cases, we recommend cross-checking with reverse DNS or vendor-published IP ranges.

Categories evolve

New AI products and crawlers appear monthly. The signature lists above are point-in-time and updated as new bots are observed at scale.

How we collect

  • Edge-side classification via Cloudflare Worker on customer domains
  • Cloudflare Logpush ingestion for customers preferring no install
  • Server-log upload for one-off audits
  • All classification happens server-side. There is no client-side script and no cross-site tracking.

See it on your site

Run a free Score on any URL to check AI readiness, or install the edge worker to start collecting the same classification data on your own domain.