Blog

Crawled, Cited, or Ignored? A Practical Framework for Measuring AI Visibility

A practical framework for measuring whether AI systems can access, crawl, parse, retrieve, cite, refer, or ignore the pages that matter.

AI visibility is not one metric. It is a chain of events: access, crawl, parse, retrieve, cite, refer, and act.

Most teams skip straight to tactics.

Add schema. Rewrite the intro. Publish FAQs. Create comparison pages. Add an llms.txt file. Update metadata. Build more content.

Some of those actions can help. But none of them answer the first measurement question:

What are AI systems actually doing on your pages?

If you cannot separate "the page is accessible" from "an AI system fetched it" from "the page was cited or used," you cannot tell whether a content change worked. You can only guess.

This framework is for measuring the state of a page before deciding what to fix.

AI visibility has layers

Treat AI visibility as a sequence, not a score.

Accessible

Can the page be fetched?

Crawled

Did a bot request the URL?

Parsed

Is the useful content readable?

Indexed

Can a retrieval system include it?

Retrieved

Was it selected for a query?

Cited

Did it appear as a source?

Referred

Did a human click through?

Acted on

Did an agent complete a task?

Read it left to right. Each step can pass while the next one fails, which is why one metric cannot explain the whole system.

Layer	What it means	What to check
Accessible	The page can be fetched by bots, search crawlers, and agents	Status code, robots rules, CDN/WAF rules, noindex, canonical
Crawled	A detectable AI or search user agent requested the URL	Server logs, CDN logs, edge logs, verified bot IP ranges
Parsed	The useful content is available in a form the system can read	Raw HTML, rendered DOM, accessibility tree, visible text
Indexed	A search or answer system may include the page in its retrieval layer	Search Console, Bing Webmaster Tools, sitemap, internal links
Retrieved	The system selected the page for a query, subquery, or task	Grounding queries, cited page reports, repeated bot visits
Cited	The page appeared as a source or supporting link	AI answer checks, Bing AI Performance, manual citation tracking
Referred	A human clicked through from an AI experience	Analytics referrers, landing pages, source patterns
Acted on	An agent used the page to complete a workflow	Form starts, API calls, checkout events, support actions

The mistake is treating these layers as interchangeable.

A page can be accessible but never crawled.

A page can be crawled but not cited.

A page can be cited but send no traffic.

A page can receive AI referral traffic from a system that fetched the source days earlier, or from an index rather than a live page request.

Each layer needs its own evidence.

Google is one layer, not the whole map

Google's guidance for AI Overviews and AI Mode is clear: the same SEO foundations still matter. Google says pages need to meet the normal technical requirements for Search, be indexed, and be eligible to show with a snippet. It also says there are no special AI markup requirements for those Google Search AI features.

That is useful guidance.

It also has a boundary: it is guidance for Google Search.

The broader AI web includes systems with different retrieval paths:

Google AI Overviews and AI Mode, which are rooted in Google Search systems.
Bing and Copilot experiences, where Microsoft now exposes AI citations, grounding queries, and page-level citation activity in Bing Webmaster Tools.
ChatGPT search, where OpenAI distinguishes OAI-SearchBot for search from GPTBot for training and ChatGPT-User for user-triggered browsing.
Claude, where Anthropic distinguishes ClaudeBot, Claude-User, and Claude-SearchBot.
Perplexity, where PerplexityBot and Perplexity-User have different jobs.
Browser agents, which may inspect screenshots, raw HTML, the DOM, and the accessibility tree.

That is why "AI visibility" cannot be reduced to one Google report, one crawler, or one optimization checklist.

The useful unit is the page

Site-wide averages hide the work.

For a SaaS, publisher, marketplace, or ecommerce site, the useful question is rarely "did AI systems visit the domain?"

The useful question is:

Which important pages did they visit?

Start with pages where AI reuse would matter:

homepage
pricing
product pages
comparison pages
category pages
documentation entry points
support pages
high-intent editorial pages
free tools and templates
pages that changed recently

Then assign each page a job.

A pricing page should help a buyer understand plans, limits, and commitment level.

A comparison page should help someone choose between alternatives.

A documentation page should help an agent or user complete implementation.

A category page should define the problem, criteria, and tradeoffs.

If the page job is vague, the measurement will be vague too.

The page-level questions to ask

For each important URL, ask the questions in order.

Diagnostic path

Access

Confirm status code, robots rules, canonical, snippet controls, and bot protection.

Requests

Check logs for AI and search user agents, then verify high-value traffic where possible.

Readability

Review raw HTML, rendered DOM, visible text, accessibility tree, and important hidden content.

Retrieval

Use platform tools to see indexing, cited pages, grounding queries, and search visibility.

Reuse

Separate citations, summaries, referrals, and agent actions from raw crawler visits.

1. Can AI systems access it?

Check the basics first:

Does the preferred URL return a clean 200?
Is the canonical URL correct?
Is the page blocked by robots.txt?
Is it blocked by noindex, X-Robots-Tag, or snippet controls?
Is the page blocked by WAF, bot protection, geofencing, or login walls?
Is the page linked from the site in a way crawlers can discover?
Is it present in the sitemap if it should be?

Access is not success. It is the starting condition.

2. Which AI systems request it?

Look at server-side logs, CDN logs, or edge logs.

Do not rely only on browser analytics. Many crawler and fetcher requests never execute JavaScript analytics. They arrive as HTTP requests, receive the page, and leave no normal browser session behind.

Track at least:

user agent
URL
timestamp
status code
referrer if present
IP or ASN where available
whether the bot identity was verified against published IP ranges

User-agent strings are useful, but they can be spoofed. Verification matters when you are making decisions from the data.

3. Can the system read the useful content?

Fetches only prove that a request happened. They do not prove the content was easy to use.

Review the page from multiple machine-readable views:

raw HTML
rendered DOM
visible text
accessibility tree
structured data where relevant
important text inside images, widgets, tabs, modals, or scripts

For browser agents, web.dev recommends thinking beyond text extraction. Agents may use screenshots, raw HTML, and the accessibility tree. That means semantic buttons, labels, stable layouts, and clear interactive elements matter.

For search-grounded systems, text still matters. Google explicitly recommends making important content available in textual form for its AI features in Search.

4. Did the page get indexed or included in a retrieval surface?

Indexing is not the same as crawling.

A crawler can fetch a page without the page becoming useful in an answer system.

Use platform-specific tools where they exist:

Google Search Console for Google indexing and Search performance.
Bing Webmaster Tools for Bing crawl, index, and AI Performance data.
URL inspection tools to confirm what the search system saw.
Sitemaps and internal links to confirm discoverability.

For Bing and Copilot-style AI experiences, Bing's AI Performance dashboard is especially useful because it reports citations, cited pages, grounding queries, and visibility trends.

For other AI products, the evidence may be less complete. That is why log-level measurement and manual answer checks still matter.

5. Is the page cited, summarized, or referred to?

Crawling is demand-side evidence. Citation and referral are reuse evidence.

Look for:

the URL appearing as a cited source
the brand or page being summarized in an answer
AI referral traffic to the page
repeated visits after a page update
related pages receiving AI referrals while this page is skipped
query-to-page patterns in tools that expose them

Do not assume silence means failure. Some AI systems may use indexed information without creating a fresh fetch near the user session. Some answers influence buyers without sending a click. But if an important page is fetched repeatedly and never cited, referred to, or mentioned, that is a useful signal.

Common patterns

Once you measure pages instead of domains, recurring patterns appear.

Homepage-only attention

Signal

AI systems request the homepage and maybe the blog index, but ignore pricing, docs, and product pages.

Check next

Internal links, sitemap coverage, navigation, and whether commercial pages are obvious from machine-readable paths.

Docs read, product pages ignored

Signal

Agents prefer implementation docs because they are concrete, structured, and specific.

Check next

Add better paths from implementation pages to product, category, pricing, and comparison pages.

Crawled but not cited

Signal

The page is accessible and fetched, but does not appear to be reused.

Check next

Look for buried answers, generic claims, weak definitions, missing tradeoffs, or important facts trapped in visual elements.

Cited but no traffic

Signal

The page appears in an answer, but users do not click.

Check next

Track citation and click outcomes separately. Influence can happen without a clean referral session.

AI referrals with no recent bot visit

Signal

A user arrives from an AI tool, but logs do not show a matching fresh crawler request.

Check next

Look for older fetches, search-index reuse, shared crawler caches, incomplete referrers, or user-triggered browsing.

Bot spike after publishing, then silence

Signal

A new page gets crawled after launch, then activity stops.

Check next

Watch whether the page is fetched again after meaningful updates, not just after the initial discovery event.

A manual measurement workflow

You can start without specialized tooling.

Pick 10 to 50 important URLs.
Confirm each page is accessible, canonical, indexable, and internally linked.
Fetch each page as raw HTML and confirm the main content is present.
Review server or CDN logs for known AI user agents.
Verify high-value bot traffic with published IP ranges where possible.
Group requests by page, bot, and week.
Compare AI referrals by landing page in analytics.
Check Google Search Console and Bing Webmaster Tools for page-level visibility.
Manually test a small set of buyer questions in AI search products.
Record each page state: not fetched, fetched, crawled but not cited, cited, referred, or changed.

The output should be an action list, not a dashboard screenshot.

Examples:

Pricing is accessible but has no detected AI bot visits.
Docs are fetched weekly by multiple systems, but product pages are ignored.
The comparison page is fetched by search bots, but has no citation or referral evidence.
The category page gained AI referrals after the latest rewrite.
The setup guide is cited, but the related pricing page is skipped.

What to fix after measuring

Only fix the page state you can see.

If a page is not accessible, fix technical access.

If a page is accessible but not fetched, fix discovery: sitemap, internal links, canonicalization, navigation, and crawl permissions.

If a page is fetched but hard to parse, fix machine readability: visible text, semantic HTML, headings, labels, and stable layouts.

If a page is fetched but not reused, fix extractability: clearer definitions, answer-first sections, evidence, comparison criteria, and specific tradeoffs.

If a page is cited but not clicked, review whether the cited answer satisfies the user without a visit, and whether the page has a clear reason to continue.

If an agent needs to act on the page, review the interface: buttons, forms, labels, error states, account requirements, and whether the next step is obvious.

The point is not to optimize every page for every AI system.

The point is to know which important pages are being accessed, ignored, reused, or blocked.

Sources worth using

These are useful starting points for building your own measurement model:

Where SeeLLM fits

You can do a basic version of this with logs, spreadsheets, Search Console, Bing Webmaster Tools, and manual checks.

That is often enough to prove the gap exists.

SeeLLM is built to make the page-level workflow easier: choose the pages that matter, see which AI systems fetch them, connect bot visits to AI referrals, and find pages that are accessible but not being reused.

Start with the free AI Visibility Score to check whether an important page is technically readable. For the reuse gap, read What Is Crawled But Not Cited?. For the page-level operating workflow, read How to Monitor Important Pages for AI Reuse. For an empirical look at why splitting by platform matters — including a ~3× swing in Claude vs ChatGPT preference between two real domains — see what 30 days of AI bot traffic on two real domains actually looks like.

See which pages AI systems can actually use.

Start with the free AI Visibility Score. When you need page-level evidence, move from static checks to monitoring the pages that matter.

Run the Free AI Visibility Score Read the April AI Bot Report

Crawled, Cited, or Ignored? A Practical Framework for Measuring AI Visibility

AI visibility has layers

Google is one layer, not the whole map

The useful unit is the page

The page-level questions to ask

1. Can AI systems access it?

2. Which AI systems request it?

3. Can the system read the useful content?

4. Did the page get indexed or included in a retrieval surface?

5. Is the page cited, summarized, or referred to?

Common patterns

A manual measurement workflow

What to fix after measuring

Sources worth using

Where SeeLLM fits

More from the field notes

What Is Crawled But Not Cited?

The New SEO Problem: Crawled, But Not Cited

How to Track AI Referral Traffic from ChatGPT, Perplexity, Gemini, and Claude

See which pages AI systems can actually use.