Blog

Crawled, Cited, or Ignored? A Practical Framework for Measuring AI Visibility

A practical framework for measuring whether AI systems can access, crawl, parse, retrieve, cite, refer, or ignore the pages that matter.

AI visibility is not one metric. It is a chain of events: access, crawl, parse, retrieve, cite, refer, and act.

Most teams skip straight to tactics.

Add schema. Rewrite the intro. Publish FAQs. Create comparison pages. Add an llms.txt file. Update metadata. Build more content.

Some of those actions can help. But none of them answer the first measurement question:

What are AI systems actually doing on your pages?

If you cannot separate "the page is accessible" from "an AI system fetched it" from "the page was cited or used," you cannot tell whether a content change worked. You can only guess.

This framework is for measuring the state of a page before deciding what to fix.

AI visibility has layers

Treat AI visibility as a sequence, not a score.

01
Accessible
Can the page be fetched?
02
Crawled
Did a bot request the URL?
03
Parsed
Is the useful content readable?
04
Indexed
Can a retrieval system include it?
05
Retrieved
Was it selected for a query?
06
Cited
Did it appear as a source?
07
Referred
Did a human click through?
08
Acted on
Did an agent complete a task?

Read it left to right. Each step can pass while the next one fails, which is why one metric cannot explain the whole system.

LayerWhat it meansWhat to check
AccessibleThe page can be fetched by bots, search crawlers, and agentsStatus code, robots rules, CDN/WAF rules, noindex, canonical
CrawledA detectable AI or search user agent requested the URLServer logs, CDN logs, edge logs, verified bot IP ranges
ParsedThe useful content is available in a form the system can readRaw HTML, rendered DOM, accessibility tree, visible text
IndexedA search or answer system may include the page in its retrieval layerSearch Console, Bing Webmaster Tools, sitemap, internal links
RetrievedThe system selected the page for a query, subquery, or taskGrounding queries, cited page reports, repeated bot visits
CitedThe page appeared as a source or supporting linkAI answer checks, Bing AI Performance, manual citation tracking
ReferredA human clicked through from an AI experienceAnalytics referrers, landing pages, source patterns
Acted onAn agent used the page to complete a workflowForm starts, API calls, checkout events, support actions

The mistake is treating these layers as interchangeable.

A page can be accessible but never crawled.

A page can be crawled but not cited.

A page can be cited but send no traffic.

A page can receive AI referral traffic from a system that fetched the source days earlier, or from an index rather than a live page request.

Each layer needs its own evidence.

Google is one layer, not the whole map

Google's guidance for AI Overviews and AI Mode is clear: the same SEO foundations still matter. Google says pages need to meet the normal technical requirements for Search, be indexed, and be eligible to show with a snippet. It also says there are no special AI markup requirements for those Google Search AI features.

That is useful guidance.

It also has a boundary: it is guidance for Google Search.

The broader AI web includes systems with different retrieval paths:

  • Google AI Overviews and AI Mode, which are rooted in Google Search systems.
  • Bing and Copilot experiences, where Microsoft now exposes AI citations, grounding queries, and page-level citation activity in Bing Webmaster Tools.
  • ChatGPT search, where OpenAI distinguishes OAI-SearchBot for search from GPTBot for training and ChatGPT-User for user-triggered browsing.
  • Claude, where Anthropic distinguishes ClaudeBot, Claude-User, and Claude-SearchBot.
  • Perplexity, where PerplexityBot and Perplexity-User have different jobs.
  • Browser agents, which may inspect screenshots, raw HTML, the DOM, and the accessibility tree.

That is why "AI visibility" cannot be reduced to one Google report, one crawler, or one optimization checklist.

The useful unit is the page

Site-wide averages hide the work.

For a SaaS, publisher, marketplace, or ecommerce site, the useful question is rarely "did AI systems visit the domain?"

The useful question is:

Which important pages did they visit?

Start with pages where AI reuse would matter:

  • homepage
  • pricing
  • product pages
  • comparison pages
  • category pages
  • documentation entry points
  • support pages
  • high-intent editorial pages
  • free tools and templates
  • pages that changed recently

Then assign each page a job.

A pricing page should help a buyer understand plans, limits, and commitment level.

A comparison page should help someone choose between alternatives.

A documentation page should help an agent or user complete implementation.

A category page should define the problem, criteria, and tradeoffs.

If the page job is vague, the measurement will be vague too.

The page-level questions to ask

For each important URL, ask the questions in order.

Diagnostic path
1
Access
Confirm status code, robots rules, canonical, snippet controls, and bot protection.
2
Requests
Check logs for AI and search user agents, then verify high-value traffic where possible.
3
Readability
Review raw HTML, rendered DOM, visible text, accessibility tree, and important hidden content.
4
Retrieval
Use platform tools to see indexing, cited pages, grounding queries, and search visibility.
5
Reuse
Separate citations, summaries, referrals, and agent actions from raw crawler visits.

1. Can AI systems access it?

Check the basics first:

  • Does the preferred URL return a clean 200?
  • Is the canonical URL correct?
  • Is the page blocked by robots.txt?
  • Is it blocked by noindex, X-Robots-Tag, or snippet controls?
  • Is the page blocked by WAF, bot protection, geofencing, or login walls?
  • Is the page linked from the site in a way crawlers can discover?
  • Is it present in the sitemap if it should be?

Access is not success. It is the starting condition.

2. Which AI systems request it?

Look at server-side logs, CDN logs, or edge logs.

Do not rely only on browser analytics. Many crawler and fetcher requests never execute JavaScript analytics. They arrive as HTTP requests, receive the page, and leave no normal browser session behind.

Track at least:

  • user agent
  • URL
  • timestamp
  • status code
  • referrer if present
  • IP or ASN where available
  • whether the bot identity was verified against published IP ranges

User-agent strings are useful, but they can be spoofed. Verification matters when you are making decisions from the data.

3. Can the system read the useful content?

Fetches only prove that a request happened. They do not prove the content was easy to use.

Review the page from multiple machine-readable views:

  • raw HTML
  • rendered DOM
  • visible text
  • accessibility tree
  • structured data where relevant
  • important text inside images, widgets, tabs, modals, or scripts

For browser agents, web.dev recommends thinking beyond text extraction. Agents may use screenshots, raw HTML, and the accessibility tree. That means semantic buttons, labels, stable layouts, and clear interactive elements matter.

For search-grounded systems, text still matters. Google explicitly recommends making important content available in textual form for its AI features in Search.

4. Did the page get indexed or included in a retrieval surface?

Indexing is not the same as crawling.

A crawler can fetch a page without the page becoming useful in an answer system.

Use platform-specific tools where they exist:

  • Google Search Console for Google indexing and Search performance.
  • Bing Webmaster Tools for Bing crawl, index, and AI Performance data.
  • URL inspection tools to confirm what the search system saw.
  • Sitemaps and internal links to confirm discoverability.

For Bing and Copilot-style AI experiences, Bing's AI Performance dashboard is especially useful because it reports citations, cited pages, grounding queries, and visibility trends.

For other AI products, the evidence may be less complete. That is why log-level measurement and manual answer checks still matter.

5. Is the page cited, summarized, or referred to?

Crawling is demand-side evidence. Citation and referral are reuse evidence.

Look for:

  • the URL appearing as a cited source
  • the brand or page being summarized in an answer
  • AI referral traffic to the page
  • repeated visits after a page update
  • related pages receiving AI referrals while this page is skipped
  • query-to-page patterns in tools that expose them

Do not assume silence means failure. Some AI systems may use indexed information without creating a fresh fetch near the user session. Some answers influence buyers without sending a click. But if an important page is fetched repeatedly and never cited, referred to, or mentioned, that is a useful signal.

Common patterns

Once you measure pages instead of domains, recurring patterns appear.

Homepage-only attention
Signal
AI systems request the homepage and maybe the blog index, but ignore pricing, docs, and product pages.
Check next
Internal links, sitemap coverage, navigation, and whether commercial pages are obvious from machine-readable paths.
Docs read, product pages ignored
Signal
Agents prefer implementation docs because they are concrete, structured, and specific.
Check next
Add better paths from implementation pages to product, category, pricing, and comparison pages.
Crawled but not cited
Signal
The page is accessible and fetched, but does not appear to be reused.
Check next
Look for buried answers, generic claims, weak definitions, missing tradeoffs, or important facts trapped in visual elements.
Cited but no traffic
Signal
The page appears in an answer, but users do not click.
Check next
Track citation and click outcomes separately. Influence can happen without a clean referral session.
AI referrals with no recent bot visit
Signal
A user arrives from an AI tool, but logs do not show a matching fresh crawler request.
Check next
Look for older fetches, search-index reuse, shared crawler caches, incomplete referrers, or user-triggered browsing.
Bot spike after publishing, then silence
Signal
A new page gets crawled after launch, then activity stops.
Check next
Watch whether the page is fetched again after meaningful updates, not just after the initial discovery event.

A manual measurement workflow

You can start without specialized tooling.

  1. Pick 10 to 50 important URLs.
  2. Confirm each page is accessible, canonical, indexable, and internally linked.
  3. Fetch each page as raw HTML and confirm the main content is present.
  4. Review server or CDN logs for known AI user agents.
  5. Verify high-value bot traffic with published IP ranges where possible.
  6. Group requests by page, bot, and week.
  7. Compare AI referrals by landing page in analytics.
  8. Check Google Search Console and Bing Webmaster Tools for page-level visibility.
  9. Manually test a small set of buyer questions in AI search products.
  10. Record each page state: not fetched, fetched, crawled but not cited, cited, referred, or changed.

The output should be an action list, not a dashboard screenshot.

Examples:

  • Pricing is accessible but has no detected AI bot visits.
  • Docs are fetched weekly by multiple systems, but product pages are ignored.
  • The comparison page is fetched by search bots, but has no citation or referral evidence.
  • The category page gained AI referrals after the latest rewrite.
  • The setup guide is cited, but the related pricing page is skipped.

What to fix after measuring

Only fix the page state you can see.

If a page is not accessible, fix technical access.

If a page is accessible but not fetched, fix discovery: sitemap, internal links, canonicalization, navigation, and crawl permissions.

If a page is fetched but hard to parse, fix machine readability: visible text, semantic HTML, headings, labels, and stable layouts.

If a page is fetched but not reused, fix extractability: clearer definitions, answer-first sections, evidence, comparison criteria, and specific tradeoffs.

If a page is cited but not clicked, review whether the cited answer satisfies the user without a visit, and whether the page has a clear reason to continue.

If an agent needs to act on the page, review the interface: buttons, forms, labels, error states, account requirements, and whether the next step is obvious.

The point is not to optimize every page for every AI system.

The point is to know which important pages are being accessed, ignored, reused, or blocked.

Sources worth using

These are useful starting points for building your own measurement model:

Where SeeLLM fits

You can do a basic version of this with logs, spreadsheets, Search Console, Bing Webmaster Tools, and manual checks.

That is often enough to prove the gap exists.

SeeLLM is built to make the page-level workflow easier: choose the pages that matter, see which AI systems fetch them, connect bot visits to AI referrals, and find pages that are accessible but not being reused.

Start with the free AI Visibility Score to check whether an important page is technically readable. For the reuse gap, read What Is Crawled But Not Cited?. For the page-level operating workflow, read How to Monitor Important Pages for AI Reuse. For an empirical look at why splitting by platform matters — including a ~3× swing in Claude vs ChatGPT preference between two real domains — see what 30 days of AI bot traffic on two real domains actually looks like.

Continue reading

More from the field notes

All posts

From reading to action

See which pages AI systems can actually use.

Start with the free AI Visibility Score. When you need page-level evidence, move from static checks to monitoring the pages that matter.