Guide

What AI Actually Sees When It Crawls Your Site: A Live Walkthrough

Your site looks great in a browser. But AI crawlers see only raw HTML -- no JavaScript, no rendered components, no dynamic content. This is a live walkthrough of exactly what GPTBot, ClaudeBot, and PerplexityBot fetch when they visit, with verified data on every claim and a 6-method test you can run today.

Nisha Kumari|April 26, 202616 min read

Your website looks great in a browser. You have crisp typography, smooth animations, beautifully laid-out product cards, and a comparison table that updates in real time. Now imagine the same page, but stripped of every line of JavaScript that ever runs after page load. No animations. No client-side rendering. No dynamic content. Just the raw HTML your server returns. That stripped-down version is what AI crawlers see -- and for many sites, it's a blank page.

Zero

JavaScript execution detected across 500 million+ GPTBot fetches -- AI crawlers see only raw HTML (Passionfruit)

This guide is a live walkthrough of what AI platforms actually see when they crawl your site. It covers the bots themselves (with verified user-agent strings), the data on what each crawler fetches, the rendering reality (no JavaScript execution), and the practical methods you can use to test what AI sees on your own pages today. Every statistic comes from a published study we verified directly. For the broader optimization framework that builds on this, see Ranqo's complete GEO guide.

Meet the Crawlers

AI platforms don't use a single crawler. Each major provider runs multiple bots with different purposes. OpenAI alone operates three: GPTBot (training data), OAI-SearchBot (search index), and ChatGPT-User (real-time user-triggered fetches). Anthropic mirrors this with ClaudeBot, Claude-User, and Claude-SearchBot, as documented by ALM Corp. If you want to control AI access, you need to know all of them.

AI Crawler User Agent Reference

The 10 most important AI crawlers visiting websites in 2026

Bot	Company	Purpose	Honors robots.txt
GPTBot	OpenAI	Training data collection
ChatGPT-User	OpenAI	User-triggered browsing in ChatGPT
OAI-SearchBot	OpenAI	ChatGPT Search index
ClaudeBot	Anthropic	Training data collection
Claude-User	Anthropic	User-triggered browsing in Claude
Claude-SearchBot	Anthropic	Claude search infrastructure
PerplexityBot	Perplexity	Perplexity search index	Inconsistent
Perplexity-User	Perplexity	User-triggered fetches in Perplexity	Inconsistent
Google-Extended	Google	Gemini training data
CCBot	Common Crawl	Web archive (used by many LLMs in training)

Three observations from the table. First, only Google-Extended (Gemini's training crawler) renders JavaScript -- because it inherits Google's indexing infrastructure. Every other major AI crawler reads raw HTML only. Second, PerplexityBot has inconsistent robots.txt compliance -- Cloudflare documented this in detail (more on that below). Third, the ChatGPT-User bot represents real-time browsing initiated by ChatGPT users asking questions about your site, which is why its volume is so much higher than batch training crawlers.

How Much AI Crawlers Actually Visit

AI crawler volume in 2026 is significant and growing. websearchapi.ai's March 2026 monthly report shows the relative share of all crawler traffic across analyzed sites.

Monthly Crawler Traffic Share (March 2026)

Share of all bot traffic across analyzed sites (websearchapi.ai monthly report)

Googlebot still leads at 31.6%, but the shift is dramatic when you combine AI-related crawlers -- GPTBot ( 12.0%), ClaudeBot ( 11.7%), Meta-ExternalAgent ( 16.7%), and PerplexityBot ( 3.15%) together represent 43.5% of crawler traffic. AI bots aren't a niche anymore; they are nearly half of everything visiting your site.

The volume per crawler matters too. Search Engine Journal's analysis of 24 million proxy requests across 69 customer websites found that ChatGPT-User made 3.6x more requests than Googlebot, with a 99.99% success rate. The opportunity (or risk) of AI visibility is now larger than the Google indexing surface most sites still optimize for.

What ChatGPT Actually Fetches

Aggregate volume tells you how much. Composition tells you what. Vercel's analysis of nextjs.org and customer sites broke down what each AI crawler actually requests when it visits a page. ChatGPT's pattern is almost entirely HTML-focused.

What ChatGPT Fetches

Composition of ChatGPT crawler requests by file type (Vercel research)

57.70% of ChatGPT's requests are HTML pages. 11.50% are JavaScript files -- but here's the catch: Vercel confirmed that ChatGPT does not execute these JS files even when it fetches them. They get downloaded but not run. The remaining ~31% covers images, CSS, JSON, and miscellaneous assets.

What does this mean in practice? ChatGPT prioritizes reading the literal text content of your HTML pages. It does not see your React-rendered components. It does not interpret your interactive widgets. If your pricing data is loaded via fetch() after page load, ChatGPT misses it entirely.

What Claude Actually Fetches

Claude's pattern is dramatically different from ChatGPT's -- which has implications for how you optimize for each platform.

What Claude Fetches

Composition of ClaudeBot requests by file type (Vercel research)

ClaudeBot focuses heavily on images -- 35.17% of its total fetches per Vercel's data. Its JavaScript fetch rate is also higher at 23.84%, though still without execution. The remaining ~41% includes HTML, CSS, and other assets.

The image priority is striking. Anthropic appears to be actively building visual understanding into its retrieval -- which means alt text, image filenames, surrounding captions, and image schema markup all matter more for Claude visibility than for ChatGPT visibility. For more on platform-specific optimization tactics, see Ranqo's platform-specific playbook.

ChatGPT reads. Claude looks. Optimizing for one without considering the other leaves citation surface on the table.

Live Walkthrough: Same Page, Three Views

Imagine a page titled "Best Project Management Tools for 2026" with a 10-product comparison list, a feature-by-feature comparison table, an FAQ section, pricing details, and JSON-LD structured data. Here's what humans see vs what AI sees on the same page, depending on rendering strategy.

Live Walkthrough: Same Page, Three Views

What humans see vs what AI sees on a JavaScript SPA vs a server-rendered page

Page Element	Humans See	AI Sees (SPA, no SSR)	AI Sees (SSR)
H1 page title	Visible -- 'Best Project Management Tools for 2026'	Empty -- HTML shell has no <h1> until JS runs	Visible in raw HTML
Product list (10 items)	10 product cards with prices, ratings, features	Empty <div id="root"></div>	Full list of 10 products in semantic HTML
Comparison table	Feature-by-feature table with check marks and prices	No table at all -- never rendered	Complete <table> with all <tr> and <td> rows
FAQ section	10 expandable Q&A items	Loading skeleton or nothing	Full questions and answers in HTML
Pricing details	$29/mo with feature breakdown	Placeholder text or zero pricing data	Pricing visible in <div> elements with semantic markup
Schema markup (JSON-LD)	Invisible (in <head>)	Possibly visible if injected before JS	Fully visible -- 3.2x AI Overview boost when present

The middle column tells the story. A SPA without SSR returns a near-empty HTML shell to AI crawlers. The H1, the product list, the comparison table, the FAQ, the pricing -- none of it is in the initial response. It appears only after JavaScript runs in a browser.

The right column shows the same content with server-side rendering or static generation. Now AI crawlers see exactly what humans see: full HTML with all content present immediately. The performance characteristics for users are often identical. The only difference is whether AI can read it.

For the comprehensive list of mistakes that hide content from AI -- including JavaScript-only rendering, paywalls, blocked crawlers, and missing schema -- see Ranqo's anti-GEO playbook.

The Stealth Crawling Problem

One AI platform breaks the rules. Cloudflare published research in August 2025 documenting that Perplexity uses stealth crawlers to evade robots.txt directives -- modifying its user-agent to impersonate Google Chrome on macOS, rotating its source ASNs, and crawling sites that have explicitly blocked PerplexityBot.

What this means in practice: blocking PerplexityBot in your robots.txt may not actually prevent Perplexity from accessing your site. The declared crawler honors the directive; an undeclared user-agent doesn't. From an analytics perspective, you may see "direct" or "unknown" bot traffic that is, in fact, Perplexity.

For sites that want Perplexity to cite them, this is arguably good news -- you don't need to do anything special to be visible. For sites that explicitly want to opt out, you may need to enforce blocks at the edge (Cloudflare, WAF rules) rather than via robots.txt alone.

How to See What AI Sees on Your Site

You don't need a specialized tool to test what AI crawlers see on your pages. The six methods below range from two-minute checks to full server-log analysis.

How to See What AI Sees: 6 Methods

Practical techniques to verify your site is visible to AI crawlers

Method	What It Shows	Interpretation	Difficulty
View Source (Cmd/Ctrl + U)	Raw HTML returned by your server	If your content isn't here, AI cannot see it	Easy
Disable JavaScript in browser	Approximation of what GPTBot/ClaudeBot sees	If page is blank, you have a critical SPA problem	Easy
curl with custom user-agent	Exactly what GPTBot or ClaudeBot receives	Run: curl -A 'GPTBot' yoursite.com	Medium
Server access logs	Real visits from AI crawlers (200/404 status)	Check for GPTBot, ClaudeBot, PerplexityBot user agents	Medium
Schema validator (Google Rich Results)	Whether structured data is parseable	Confirm JSON-LD schema renders correctly	Easy
AI prompt: "Summarize [your URL]"	Practical end-result of crawler visibility	If response is generic or wrong, AI is missing context	Easy

The fastest test: open your page in a browser, press Cmd+U (or Ctrl+U), and look at the raw HTML. If the actual content of the page (headings, paragraphs, product info, prices) isn't in that source, AI crawlers can't see it either. The View Source pane is essentially what GPTBot, ClaudeBot, and PerplexityBot receive.

The most rigorous test: run curl -A "GPTBot" https://yoursite.com/page from a terminal. This sends a request with the GPTBot user-agent string and shows you exactly what the crawler receives. Compare the output to what the page actually displays in a browser. If they differ, you have a rendering gap.

The most useful long-term test: check your server access logs for AI crawler user-agents. Look for GPTBot, ChatGPT-User, OAI-SearchBot, ClaudeBot, Claude-User, PerplexityBot, and Google-Extended in the user-agent column. The HTTP status codes (200/404/etc.) tell you whether crawlers are succeeding. For the technical audit framework that maps to all of this, see Ranqo's AI readiness audit guide.

What to Do About It

Five concrete actions, ordered by impact:

1. Enable server-side rendering (SSR) or static generation

If you use React, Vue, or Angular, ensure the framework is configured to render content on the server (Next.js, Nuxt, Remix, SvelteKit) or generate static pages at build time. Verify with View Source -- your content should be in the raw HTML.

2. Audit your robots.txt for accidental blocks

Check explicitly for User-agent: GPTBot, ClaudeBot, ChatGPT-User, OAI-SearchBot, PerplexityBot, and Google-Extended. Make sure you're not blocking the bots you want crawling you. The default for most CMS platforms is fine, but custom robots.txt files often contain legacy blocks.

3. Add structured data (JSON-LD)

Per Frase's data, FAQ schema makes pages 3.2x more likely to appear in AI Overviews -- yet only 12.4% of websites use any structured data. JSON-LD goes in your raw HTML, so AI crawlers see it even without JavaScript execution.

4. Make critical content visible without JavaScript

Pricing, product specs, comparison tables, and FAQ content should all be present in the initial HTML response, not loaded asynchronously via fetch() after page load. This is the single biggest visibility lever for SaaS and e-commerce sites.

5. Optimize content for what each crawler prioritizes

ChatGPT prioritizes HTML text. Claude prioritizes images. Each platform has distinct preferences. For full platform-specific optimization, see Ranqo's 5 factors that drive AI citations and the 7-step content optimization playbook.

The gap between what humans see and what AI sees is the gap between your traffic and your AI citations. Close that gap first; everything else is downstream.

Audit your site for what AI actually sees

Run a 6-dimension AI readiness audit on any page: crawlability, content quality, page speed, AI extractability, citation potential, and authority. The audit identifies JavaScript rendering issues, schema gaps, and crawler access problems automatically. For the deeper context, see why Google rankings don't transfer to AI and the complete llms.txt guide.

Run your audit

Written by

Nisha Kumari

Co-Founder at Ranqo

Nisha Kumari is Co-Founder at Ranqo, where she leads growth strategy and client acquisition. With a background in digital marketing and financial management, she specializes in SEO, Generative Engine Optimization, and helping brands build visibility across AI platforms.

Share this article

Guide

What AI Actually Sees When It Crawls Your Site: A Live Walkthrough

Nisha Kumari|April 26, 202616 min read

Zero

JavaScript execution detected across 500 million+ GPTBot fetches -- AI crawlers see only raw HTML (Passionfruit)

Meet the Crawlers

AI Crawler User Agent Reference

The 10 most important AI crawlers visiting websites in 2026

Bot	Company	Purpose	Honors robots.txt
GPTBot	OpenAI	Training data collection
ChatGPT-User	OpenAI	User-triggered browsing in ChatGPT
OAI-SearchBot	OpenAI	ChatGPT Search index
ClaudeBot	Anthropic	Training data collection
Claude-User	Anthropic	User-triggered browsing in Claude
Claude-SearchBot	Anthropic	Claude search infrastructure
PerplexityBot	Perplexity	Perplexity search index	Inconsistent
Perplexity-User	Perplexity	User-triggered fetches in Perplexity	Inconsistent
Google-Extended	Google	Gemini training data
CCBot	Common Crawl	Web archive (used by many LLMs in training)

How Much AI Crawlers Actually Visit

AI crawler volume in 2026 is significant and growing. websearchapi.ai's March 2026 monthly report shows the relative share of all crawler traffic across analyzed sites.

Monthly Crawler Traffic Share (March 2026)

Share of all bot traffic across analyzed sites (websearchapi.ai monthly report)

What ChatGPT Actually Fetches

What ChatGPT Fetches

Composition of ChatGPT crawler requests by file type (Vercel research)

What Claude Actually Fetches

Claude's pattern is dramatically different from ChatGPT's -- which has implications for how you optimize for each platform.

What Claude Fetches

Composition of ClaudeBot requests by file type (Vercel research)

ChatGPT reads. Claude looks. Optimizing for one without considering the other leaves citation surface on the table.

Live Walkthrough: Same Page, Three Views

What humans see vs what AI sees on a JavaScript SPA vs a server-rendered page

Page Element	Humans See	AI Sees (SPA, no SSR)	AI Sees (SSR)
H1 page title	Visible -- 'Best Project Management Tools for 2026'	Empty -- HTML shell has no <h1> until JS runs	Visible in raw HTML
Product list (10 items)	10 product cards with prices, ratings, features	Empty <div id="root"></div>	Full list of 10 products in semantic HTML
Comparison table	Feature-by-feature table with check marks and prices	No table at all -- never rendered	Complete <table> with all <tr> and <td> rows
FAQ section	10 expandable Q&A items	Loading skeleton or nothing	Full questions and answers in HTML
Pricing details	$29/mo with feature breakdown	Placeholder text or zero pricing data	Pricing visible in <div> elements with semantic markup
Schema markup (JSON-LD)	Invisible (in <head>)	Possibly visible if injected before JS	Fully visible -- 3.2x AI Overview boost when present

For the comprehensive list of mistakes that hide content from AI -- including JavaScript-only rendering, paywalls, blocked crawlers, and missing schema -- see Ranqo's anti-GEO playbook.

The Stealth Crawling Problem

How to See What AI Sees on Your Site

You don't need a specialized tool to test what AI crawlers see on your pages. The six methods below range from two-minute checks to full server-log analysis.

How to See What AI Sees: 6 Methods

Practical techniques to verify your site is visible to AI crawlers

Method	What It Shows	Interpretation	Difficulty
View Source (Cmd/Ctrl + U)	Raw HTML returned by your server	If your content isn't here, AI cannot see it	Easy
Disable JavaScript in browser	Approximation of what GPTBot/ClaudeBot sees	If page is blank, you have a critical SPA problem	Easy
curl with custom user-agent	Exactly what GPTBot or ClaudeBot receives	Run: curl -A 'GPTBot' yoursite.com	Medium
Server access logs	Real visits from AI crawlers (200/404 status)	Check for GPTBot, ClaudeBot, PerplexityBot user agents	Medium
Schema validator (Google Rich Results)	Whether structured data is parseable	Confirm JSON-LD schema renders correctly	Easy
AI prompt: "Summarize [your URL]"	Practical end-result of crawler visibility	If response is generic or wrong, AI is missing context	Easy

What to Do About It

Five concrete actions, ordered by impact:

1. Enable server-side rendering (SSR) or static generation

2. Audit your robots.txt for accidental blocks

3. Add structured data (JSON-LD)

4. Make critical content visible without JavaScript

5. Optimize content for what each crawler prioritizes

The gap between what humans see and what AI sees is the gap between your traffic and your AI citations. Close that gap first; everything else is downstream.

Audit your site for what AI actually sees

Run your audit

Written by

Nisha Kumari

Co-Founder at Ranqo

Share this article

Meet the Crawlers

AI Crawler User Agent Reference

How Much AI Crawlers Actually Visit

Monthly Crawler Traffic Share (March 2026)

What ChatGPT Actually Fetches

What ChatGPT Fetches

What Claude Actually Fetches

What Claude Fetches

The JavaScript Blind Spot

Live Walkthrough: Same Page, Three Views

Live Walkthrough: Same Page, Three Views

The Stealth Crawling Problem

How to See What AI Sees on Your Site

How to See What AI Sees: 6 Methods

What to Do About It

1. Enable server-side rendering (SSR) or static generation

2. Audit your robots.txt for accidental blocks

3. Add structured data (JSON-LD)

4. Make critical content visible without JavaScript

5. Optimize content for what each crawler prioritizes

Audit your site for what AI actually sees

Nisha Kumari

Meet the Crawlers

AI Crawler User Agent Reference

How Much AI Crawlers Actually Visit

Monthly Crawler Traffic Share (March 2026)

What ChatGPT Actually Fetches

What ChatGPT Fetches

What Claude Actually Fetches

What Claude Fetches

The JavaScript Blind Spot

Live Walkthrough: Same Page, Three Views

Live Walkthrough: Same Page, Three Views

The Stealth Crawling Problem

How to See What AI Sees on Your Site

How to See What AI Sees: 6 Methods

What to Do About It

1. Enable server-side rendering (SSR) or static generation

2. Audit your robots.txt for accidental blocks

3. Add structured data (JSON-LD)

4. Make critical content visible without JavaScript

5. Optimize content for what each crawler prioritizes

Audit your site for what AI actually sees

Nisha Kumari