What AI Actually Sees When It Crawls Your Site: A Live Walkthrough
Your site looks great in a browser. But AI crawlers see only raw HTML -- no JavaScript, no rendered components, no dynamic content. This is a live walkthrough of exactly what GPTBot, ClaudeBot, and PerplexityBot fetch when they visit, with verified data on every claim and a 6-method test you can run today.
Your website looks great in a browser. You have crisp typography, smooth animations, beautifully laid-out product cards, and a comparison table that updates in real time. Now imagine the same page, but stripped of every line of JavaScript that ever runs after page load. No animations. No client-side rendering. No dynamic content. Just the raw HTML your server returns. That stripped-down version is what AI crawlers see -- and for many sites, it's a blank page.
Zero
JavaScript execution detected across 500 million+ GPTBot fetches -- AI crawlers see only raw HTML (Passionfruit)
This guide is a live walkthrough of what AI platforms actually see when they crawl your site. It covers the bots themselves (with verified user-agent strings), the data on what each crawler fetches, the rendering reality (no JavaScript execution), and the practical methods you can use to test what AI sees on your own pages today. Every statistic comes from a published study we verified directly. For the broader optimization framework that builds on this, see Ranqo's complete GEO guide.
Meet the Crawlers
AI platforms don't use a single crawler. Each major provider runs multiple bots with different purposes. OpenAI alone operates three: GPTBot (training data), OAI-SearchBot (search index), and ChatGPT-User (real-time user-triggered fetches). Anthropic mirrors this with ClaudeBot, Claude-User, and Claude-SearchBot, as documented by ALM Corp. If you want to control AI access, you need to know all of them.
AI Crawler User Agent Reference
The 10 most important AI crawlers visiting websites in 2026
| Bot | Company | Purpose | Renders JS | Honors robots.txt |
|---|---|---|---|---|
| GPTBot | OpenAI | Training data collection | ||
| ChatGPT-User | OpenAI | User-triggered browsing in ChatGPT | ||
| OAI-SearchBot | OpenAI | ChatGPT Search index | ||
| ClaudeBot | Anthropic | Training data collection | ||
| Claude-User | Anthropic | User-triggered browsing in Claude | ||
| Claude-SearchBot | Anthropic | Claude search infrastructure | ||
| PerplexityBot | Perplexity | Perplexity search index | Inconsistent | |
| Perplexity-User | Perplexity | User-triggered fetches in Perplexity | Inconsistent | |
| Google-Extended | Gemini training data | |||
| CCBot | Common Crawl | Web archive (used by many LLMs in training) |
Three observations from the table. First, only Google-Extended (Gemini's training crawler) renders JavaScript -- because it inherits Google's indexing infrastructure. Every other major AI crawler reads raw HTML only. Second, PerplexityBot has inconsistent robots.txt compliance -- Cloudflare documented this in detail (more on that below). Third, the ChatGPT-User bot represents real-time browsing initiated by ChatGPT users asking questions about your site, which is why its volume is so much higher than batch training crawlers.
How Much AI Crawlers Actually Visit
AI crawler volume in 2026 is significant and growing. websearchapi.ai's March 2026 monthly report shows the relative share of all crawler traffic across analyzed sites.
Monthly Crawler Traffic Share (March 2026)
Share of all bot traffic across analyzed sites (websearchapi.ai monthly report)
Googlebot still leads at 31.6%, but the shift is dramatic when you combine AI-related crawlers -- GPTBot ( 12.0%), ClaudeBot ( 11.7%), Meta-ExternalAgent ( 16.7%), and PerplexityBot ( 3.15%) together represent 43.5% of crawler traffic. AI bots aren't a niche anymore; they are nearly half of everything visiting your site.
The volume per crawler matters too. Search Engine Journal's analysis of 24 million proxy requests across 69 customer websites found that ChatGPT-User made 3.6x more requests than Googlebot, with a 99.99% success rate. The opportunity (or risk) of AI visibility is now larger than the Google indexing surface most sites still optimize for.
What ChatGPT Actually Fetches
Aggregate volume tells you how much. Composition tells you what. Vercel's analysis of nextjs.org and customer sites broke down what each AI crawler actually requests when it visits a page. ChatGPT's pattern is almost entirely HTML-focused.
What ChatGPT Fetches
Composition of ChatGPT crawler requests by file type (Vercel research)
57.70% of ChatGPT's requests are HTML pages. 11.50% are JavaScript files -- but here's the catch: Vercel confirmed that ChatGPT does not execute these JS files even when it fetches them. They get downloaded but not run. The remaining ~31% covers images, CSS, JSON, and miscellaneous assets.
What does this mean in practice? ChatGPT prioritizes reading the literal text content of your HTML pages. It does not see your React-rendered components. It does not interpret your interactive widgets. If your pricing data is loaded via fetch() after page load, ChatGPT misses it entirely.
What Claude Actually Fetches
Claude's pattern is dramatically different from ChatGPT's -- which has implications for how you optimize for each platform.
What Claude Fetches
Composition of ClaudeBot requests by file type (Vercel research)
ClaudeBot focuses heavily on images -- 35.17% of its total fetches per Vercel's data. Its JavaScript fetch rate is also higher at 23.84%, though still without execution. The remaining ~41% includes HTML, CSS, and other assets.
The image priority is striking. Anthropic appears to be actively building visual understanding into its retrieval -- which means alt text, image filenames, surrounding captions, and image schema markup all matter more for Claude visibility than for ChatGPT visibility. For more on platform-specific optimization tactics, see Ranqo's platform-specific playbook.
ChatGPT reads. Claude looks. Optimizing for one without considering the other leaves citation surface on the table.
The JavaScript Blind Spot
This is the single most important technical fact in AI visibility: an analysis of 500 million+ GPTBot fetches found zero evidence of JavaScript execution. Vercel's independent research confirmed the same finding: "none of the major AI crawlers currently render JavaScript."
The implication: every line of content that requires JavaScript to appear on your page is invisible to AI. A typical React or Vue single-page app, served without server-side rendering, is a blank shell to GPTBot, ClaudeBot, and PerplexityBot. The page renders perfectly for human visitors. For AI, it's an empty <div id="root"></div>.
The exception is Google-Extended, which inherits Googlebot's rendering infrastructure and can process JavaScript. This means a React SPA may rank well on Google and appear in Gemini results while being completely invisible to ChatGPT and Claude -- the platforms that, per the April 2026 First Page Sage market share report, account for roughly 65% of AI chatbot usage.
A React SPA without SSR ranks on Google and shows up in Gemini. But it is a blank page to ChatGPT, Claude, and Perplexity. Three of the five major platforms see nothing.
Live Walkthrough: Same Page, Three Views
Imagine a page titled "Best Project Management Tools for 2026" with a 10-product comparison list, a feature-by-feature comparison table, an FAQ section, pricing details, and JSON-LD structured data. Here's what humans see vs what AI sees on the same page, depending on rendering strategy.
Live Walkthrough: Same Page, Three Views
What humans see vs what AI sees on a JavaScript SPA vs a server-rendered page
| Page Element | Humans See | AI Sees (SPA, no SSR) | AI Sees (SSR) |
|---|---|---|---|
| H1 page title | Visible -- 'Best Project Management Tools for 2026' | Empty -- HTML shell has no <h1> until JS runs | Visible in raw HTML |
| Product list (10 items) | 10 product cards with prices, ratings, features | Empty <div id="root"></div> | Full list of 10 products in semantic HTML |
| Comparison table | Feature-by-feature table with check marks and prices | No table at all -- never rendered | Complete <table> with all <tr> and <td> rows |
| FAQ section | 10 expandable Q&A items | Loading skeleton or nothing | Full questions and answers in HTML |
| Pricing details | $29/mo with feature breakdown | Placeholder text or zero pricing data | Pricing visible in <div> elements with semantic markup |
| Schema markup (JSON-LD) | Invisible (in <head>) | Possibly visible if injected before JS | Fully visible -- 3.2x AI Overview boost when present |
The middle column tells the story. A SPA without SSR returns a near-empty HTML shell to AI crawlers. The H1, the product list, the comparison table, the FAQ, the pricing -- none of it is in the initial response. It appears only after JavaScript runs in a browser.
The right column shows the same content with server-side rendering or static generation. Now AI crawlers see exactly what humans see: full HTML with all content present immediately. The performance characteristics for users are often identical. The only difference is whether AI can read it.
For the comprehensive list of mistakes that hide content from AI -- including JavaScript-only rendering, paywalls, blocked crawlers, and missing schema -- see Ranqo's anti-GEO playbook.
The Stealth Crawling Problem
One AI platform breaks the rules. Cloudflare published research in August 2025 documenting that Perplexity uses stealth crawlers to evade robots.txt directives -- modifying its user-agent to impersonate Google Chrome on macOS, rotating its source ASNs, and crawling sites that have explicitly blocked PerplexityBot.
What this means in practice: blocking PerplexityBot in your robots.txt may not actually prevent Perplexity from accessing your site. The declared crawler honors the directive; an undeclared user-agent doesn't. From an analytics perspective, you may see "direct" or "unknown" bot traffic that is, in fact, Perplexity.
For sites that want Perplexity to cite them, this is arguably good news -- you don't need to do anything special to be visible. For sites that explicitly want to opt out, you may need to enforce blocks at the edge (Cloudflare, WAF rules) rather than via robots.txt alone.
How to See What AI Sees on Your Site
You don't need a specialized tool to test what AI crawlers see on your pages. The six methods below range from two-minute checks to full server-log analysis.
How to See What AI Sees: 6 Methods
Practical techniques to verify your site is visible to AI crawlers
| Method | What It Shows | Interpretation | Difficulty |
|---|---|---|---|
| View Source (Cmd/Ctrl + U) | Raw HTML returned by your server | If your content isn't here, AI cannot see it | Easy |
| Disable JavaScript in browser | Approximation of what GPTBot/ClaudeBot sees | If page is blank, you have a critical SPA problem | Easy |
| curl with custom user-agent | Exactly what GPTBot or ClaudeBot receives | Run: curl -A 'GPTBot' yoursite.com | Medium |
| Server access logs | Real visits from AI crawlers (200/404 status) | Check for GPTBot, ClaudeBot, PerplexityBot user agents | Medium |
| Schema validator (Google Rich Results) | Whether structured data is parseable | Confirm JSON-LD schema renders correctly | Easy |
| AI prompt: "Summarize [your URL]" | Practical end-result of crawler visibility | If response is generic or wrong, AI is missing context | Easy |
The fastest test: open your page in a browser, press Cmd+U (or Ctrl+U), and look at the raw HTML. If the actual content of the page (headings, paragraphs, product info, prices) isn't in that source, AI crawlers can't see it either. The View Source pane is essentially what GPTBot, ClaudeBot, and PerplexityBot receive.
The most rigorous test: run curl -A "GPTBot" https://yoursite.com/page from a terminal. This sends a request with the GPTBot user-agent string and shows you exactly what the crawler receives. Compare the output to what the page actually displays in a browser. If they differ, you have a rendering gap.
The most useful long-term test: check your server access logs for AI crawler user-agents. Look for GPTBot, ChatGPT-User, OAI-SearchBot, ClaudeBot, Claude-User, PerplexityBot, and Google-Extended in the user-agent column. The HTTP status codes (200/404/etc.) tell you whether crawlers are succeeding. For the technical audit framework that maps to all of this, see Ranqo's AI readiness audit guide.
What to Do About It
Five concrete actions, ordered by impact:
1. Enable server-side rendering (SSR) or static generation
If you use React, Vue, or Angular, ensure the framework is configured to render content on the server (Next.js, Nuxt, Remix, SvelteKit) or generate static pages at build time. Verify with View Source -- your content should be in the raw HTML.
2. Audit your robots.txt for accidental blocks
Check explicitly for User-agent: GPTBot, ClaudeBot, ChatGPT-User, OAI-SearchBot, PerplexityBot, and Google-Extended. Make sure you're not blocking the bots you want crawling you. The default for most CMS platforms is fine, but custom robots.txt files often contain legacy blocks.
3. Add structured data (JSON-LD)
Per Frase's data, FAQ schema makes pages 3.2x more likely to appear in AI Overviews -- yet only 12.4% of websites use any structured data. JSON-LD goes in your raw HTML, so AI crawlers see it even without JavaScript execution.
4. Make critical content visible without JavaScript
Pricing, product specs, comparison tables, and FAQ content should all be present in the initial HTML response, not loaded asynchronously via fetch() after page load. This is the single biggest visibility lever for SaaS and e-commerce sites.
5. Optimize content for what each crawler prioritizes
ChatGPT prioritizes HTML text. Claude prioritizes images. Each platform has distinct preferences. For full platform-specific optimization, see Ranqo's 5 factors that drive AI citations and the 7-step content optimization playbook.
The gap between what humans see and what AI sees is the gap between your traffic and your AI citations. Close that gap first; everything else is downstream.
Audit your site for what AI actually sees
Run a 6-dimension AI readiness audit on any page: crawlability, content quality, page speed, AI extractability, citation potential, and authority. The audit identifies JavaScript rendering issues, schema gaps, and crawler access problems automatically. For the deeper context, see why Google rankings don't transfer to AI and the complete llms.txt guide.
Run your auditWritten by
Nisha Kumari
Nisha Kumari is Co-Founder at Ranqo, where she leads growth strategy and client acquisition. With a background in digital marketing and financial management, she specializes in SEO, Generative Engine Optimization, and helping brands build visibility across AI platforms.
Share this article