Crawled, Cited, or Ignored? A Practical Framework for Measuring AI Visibility
A practical framework for measuring whether AI systems can access, crawl, parse, retrieve, cite, refer, or ignore the pages that matter.
AI visibility is not one metric. It is a chain of events: access, crawl, parse, retrieve, cite, refer, and act.
Most teams skip straight to tactics.
Add schema. Rewrite the intro. Publish FAQs. Create comparison pages. Add an llms.txt file. Update metadata. Build more content.
Some of those actions can help. But none of them answer the first measurement question:
What are AI systems actually doing on your pages?
If you cannot separate "the page is accessible" from "an AI system fetched it" from "the page was cited or used," you cannot tell whether a content change worked. You can only guess.
This framework is for measuring the state of a page before deciding what to fix.
AI visibility has layers
Treat AI visibility as a sequence, not a score.
Read it left to right. Each step can pass while the next one fails, which is why one metric cannot explain the whole system.
| Layer | What it means | What to check |
|---|---|---|
| Accessible | The page can be fetched by bots, search crawlers, and agents | Status code, robots rules, CDN/WAF rules, noindex, canonical |
| Crawled | A detectable AI or search user agent requested the URL | Server logs, CDN logs, edge logs, verified bot IP ranges |
| Parsed | The useful content is available in a form the system can read | Raw HTML, rendered DOM, accessibility tree, visible text |
| Indexed | A search or answer system may include the page in its retrieval layer | Search Console, Bing Webmaster Tools, sitemap, internal links |
| Retrieved | The system selected the page for a query, subquery, or task | Grounding queries, cited page reports, repeated bot visits |
| Cited | The page appeared as a source or supporting link | AI answer checks, Bing AI Performance, manual citation tracking |
| Referred | A human clicked through from an AI experience | Analytics referrers, landing pages, source patterns |
| Acted on | An agent used the page to complete a workflow | Form starts, API calls, checkout events, support actions |
The mistake is treating these layers as interchangeable.
A page can be accessible but never crawled.
A page can be crawled but not cited.
A page can be cited but send no traffic.
A page can receive AI referral traffic from a system that fetched the source days earlier, or from an index rather than a live page request.
Each layer needs its own evidence.
Google is one layer, not the whole map
Google's guidance for AI Overviews and AI Mode is clear: the same SEO foundations still matter. Google says pages need to meet the normal technical requirements for Search, be indexed, and be eligible to show with a snippet. It also says there are no special AI markup requirements for those Google Search AI features.
That is useful guidance.
It also has a boundary: it is guidance for Google Search.
The broader AI web includes systems with different retrieval paths:
- Google AI Overviews and AI Mode, which are rooted in Google Search systems.
- Bing and Copilot experiences, where Microsoft now exposes AI citations, grounding queries, and page-level citation activity in Bing Webmaster Tools.
- ChatGPT search, where OpenAI distinguishes
OAI-SearchBotfor search fromGPTBotfor training andChatGPT-Userfor user-triggered browsing. - Claude, where Anthropic distinguishes
ClaudeBot,Claude-User, andClaude-SearchBot. - Perplexity, where
PerplexityBotandPerplexity-Userhave different jobs. - Browser agents, which may inspect screenshots, raw HTML, the DOM, and the accessibility tree.
That is why "AI visibility" cannot be reduced to one Google report, one crawler, or one optimization checklist.
The useful unit is the page
Site-wide averages hide the work.
For a SaaS, publisher, marketplace, or ecommerce site, the useful question is rarely "did AI systems visit the domain?"
The useful question is:
Which important pages did they visit?
Start with pages where AI reuse would matter:
- homepage
- pricing
- product pages
- comparison pages
- category pages
- documentation entry points
- support pages
- high-intent editorial pages
- free tools and templates
- pages that changed recently
Then assign each page a job.
A pricing page should help a buyer understand plans, limits, and commitment level.
A comparison page should help someone choose between alternatives.
A documentation page should help an agent or user complete implementation.
A category page should define the problem, criteria, and tradeoffs.
If the page job is vague, the measurement will be vague too.
The page-level questions to ask
For each important URL, ask the questions in order.
1. Can AI systems access it?
Check the basics first:
- Does the preferred URL return a clean
200? - Is the canonical URL correct?
- Is the page blocked by
robots.txt? - Is it blocked by
noindex,X-Robots-Tag, or snippet controls? - Is the page blocked by WAF, bot protection, geofencing, or login walls?
- Is the page linked from the site in a way crawlers can discover?
- Is it present in the sitemap if it should be?
Access is not success. It is the starting condition.
2. Which AI systems request it?
Look at server-side logs, CDN logs, or edge logs.
Do not rely only on browser analytics. Many crawler and fetcher requests never execute JavaScript analytics. They arrive as HTTP requests, receive the page, and leave no normal browser session behind.
Track at least:
- user agent
- URL
- timestamp
- status code
- referrer if present
- IP or ASN where available
- whether the bot identity was verified against published IP ranges
User-agent strings are useful, but they can be spoofed. Verification matters when you are making decisions from the data.
3. Can the system read the useful content?
Fetches only prove that a request happened. They do not prove the content was easy to use.
Review the page from multiple machine-readable views:
- raw HTML
- rendered DOM
- visible text
- accessibility tree
- structured data where relevant
- important text inside images, widgets, tabs, modals, or scripts
For browser agents, web.dev recommends thinking beyond text extraction. Agents may use screenshots, raw HTML, and the accessibility tree. That means semantic buttons, labels, stable layouts, and clear interactive elements matter.
For search-grounded systems, text still matters. Google explicitly recommends making important content available in textual form for its AI features in Search.
4. Did the page get indexed or included in a retrieval surface?
Indexing is not the same as crawling.
A crawler can fetch a page without the page becoming useful in an answer system.
Use platform-specific tools where they exist:
- Google Search Console for Google indexing and Search performance.
- Bing Webmaster Tools for Bing crawl, index, and AI Performance data.
- URL inspection tools to confirm what the search system saw.
- Sitemaps and internal links to confirm discoverability.
For Bing and Copilot-style AI experiences, Bing's AI Performance dashboard is especially useful because it reports citations, cited pages, grounding queries, and visibility trends.
For other AI products, the evidence may be less complete. That is why log-level measurement and manual answer checks still matter.
5. Is the page cited, summarized, or referred to?
Crawling is demand-side evidence. Citation and referral are reuse evidence.
Look for:
- the URL appearing as a cited source
- the brand or page being summarized in an answer
- AI referral traffic to the page
- repeated visits after a page update
- related pages receiving AI referrals while this page is skipped
- query-to-page patterns in tools that expose them
Do not assume silence means failure. Some AI systems may use indexed information without creating a fresh fetch near the user session. Some answers influence buyers without sending a click. But if an important page is fetched repeatedly and never cited, referred to, or mentioned, that is a useful signal.
Common patterns
Once you measure pages instead of domains, recurring patterns appear.
A manual measurement workflow
You can start without specialized tooling.
- Pick 10 to 50 important URLs.
- Confirm each page is accessible, canonical, indexable, and internally linked.
- Fetch each page as raw HTML and confirm the main content is present.
- Review server or CDN logs for known AI user agents.
- Verify high-value bot traffic with published IP ranges where possible.
- Group requests by page, bot, and week.
- Compare AI referrals by landing page in analytics.
- Check Google Search Console and Bing Webmaster Tools for page-level visibility.
- Manually test a small set of buyer questions in AI search products.
- Record each page state: not fetched, fetched, crawled but not cited, cited, referred, or changed.
The output should be an action list, not a dashboard screenshot.
Examples:
- Pricing is accessible but has no detected AI bot visits.
- Docs are fetched weekly by multiple systems, but product pages are ignored.
- The comparison page is fetched by search bots, but has no citation or referral evidence.
- The category page gained AI referrals after the latest rewrite.
- The setup guide is cited, but the related pricing page is skipped.
What to fix after measuring
Only fix the page state you can see.
If a page is not accessible, fix technical access.
If a page is accessible but not fetched, fix discovery: sitemap, internal links, canonicalization, navigation, and crawl permissions.
If a page is fetched but hard to parse, fix machine readability: visible text, semantic HTML, headings, labels, and stable layouts.
If a page is fetched but not reused, fix extractability: clearer definitions, answer-first sections, evidence, comparison criteria, and specific tradeoffs.
If a page is cited but not clicked, review whether the cited answer satisfies the user without a visit, and whether the page has a clear reason to continue.
If an agent needs to act on the page, review the interface: buttons, forms, labels, error states, account requirements, and whether the next step is obvious.
The point is not to optimize every page for every AI system.
The point is to know which important pages are being accessed, ignored, reused, or blocked.
Sources worth using
These are useful starting points for building your own measurement model:
- Google's guide to optimizing for generative AI features on Search
- Google's AI features and your website guide
- Bing Webmaster Tools AI Performance dashboard
- OpenAI crawler documentation
- Anthropic crawler documentation
- Perplexity crawler documentation
- web.dev guide to building agent-friendly websites
- Cloudflare's guide to detecting AI crawlers
Where SeeLLM fits
You can do a basic version of this with logs, spreadsheets, Search Console, Bing Webmaster Tools, and manual checks.
That is often enough to prove the gap exists.
SeeLLM is built to make the page-level workflow easier: choose the pages that matter, see which AI systems fetch them, connect bot visits to AI referrals, and find pages that are accessible but not being reused.
Start with the free AI Visibility Score to check whether an important page is technically readable. For the reuse gap, read What Is Crawled But Not Cited?. For the page-level operating workflow, read How to Monitor Important Pages for AI Reuse. For an empirical look at why splitting by platform matters — including a ~3× swing in Claude vs ChatGPT preference between two real domains — see what 30 days of AI bot traffic on two real domains actually looks like.
Continue reading
More from the field notes
May 18, 2026
What Is Crawled But Not Cited?
Crawled but not cited means AI systems can fetch a page, but the page does not appear in answers, citations, referrals, or recommendations.
April 24, 2026
The New SEO Problem: Crawled, But Not Cited
AI visibility is becoming operational. The new failure mode is important pages getting fetched by AI systems and never reused.
May 23, 2026
What 30 Days of AI Bot Traffic on Two Real Domains Actually Looks Like
Four patterns from 30 days of AI bot traffic on a consumer content site and a B2B SaaS site — patterns that flat AI-traffic dashboards hide.
From reading to action
See which pages AI systems can actually use.
Start with the free AI Visibility Score. When you need page-level evidence, move from static checks to monitoring the pages that matter.