[]
Ranqo
PricingBlog
Guide

What AI Actually Sees When It Crawls Your Site: A Live Walkthrough

Your site looks great in a browser. But AI crawlers see only raw HTML -- no JavaScript, no rendered components, no dynamic content. This is a live walkthrough of exactly what GPTBot, ClaudeBot, and PerplexityBot fetch when they visit, with verified data on every claim and a 6-method test you can run today.

Nisha Kumari|April 26, 202616 min read

On this page

Your website looks great in a browser. You have crisp typography, smooth animations, beautifully laid-out product cards, and a comparison table that updates in real time. Now imagine the same page, but stripped of every line of JavaScript that ever runs after page load. No animations. No client-side rendering. No dynamic content. Just the raw HTML your server returns. That stripped-down version is what AI crawlers see -- and for many sites, it's a blank page.

Zero

JavaScript execution detected across 500 million+ GPTBot fetches -- AI crawlers see only raw HTML (Passionfruit)

This guide is a live walkthrough of what AI platforms actually see when they crawl your site. It covers the bots themselves (with verified user-agent strings), the data on what each crawler fetches, the rendering reality (no JavaScript execution), and the practical methods you can use to test what AI sees on your own pages today. Every statistic comes from a published study we verified directly. For the broader optimization framework that builds on this, see Ranqo's complete GEO guide.

Meet the Crawlers

AI platforms don't use a single crawler. Each major provider runs multiple bots with different purposes. OpenAI alone operates three: GPTBot (training data), OAI-SearchBot (search index), and ChatGPT-User (real-time user-triggered fetches). Anthropic mirrors this with ClaudeBot, Claude-User, and Claude-SearchBot, as documented by ALM Corp. If you want to control AI access, you need to know all of them.

AI Crawler User Agent Reference

The 10 most important AI crawlers visiting websites in 2026

BotCompanyPurposeRenders JSHonors robots.txt
GPTBotOpenAITraining data collection
ChatGPT-UserOpenAIUser-triggered browsing in ChatGPT
OAI-SearchBotOpenAIChatGPT Search index
ClaudeBotAnthropicTraining data collection
Claude-UserAnthropicUser-triggered browsing in Claude
Claude-SearchBotAnthropicClaude search infrastructure
PerplexityBotPerplexityPerplexity search indexInconsistent
Perplexity-UserPerplexityUser-triggered fetches in PerplexityInconsistent
Google-ExtendedGoogleGemini training data
CCBotCommon CrawlWeb archive (used by many LLMs in training)

Three observations from the table. First, only Google-Extended (Gemini's training crawler) renders JavaScript -- because it inherits Google's indexing infrastructure. Every other major AI crawler reads raw HTML only. Second, PerplexityBot has inconsistent robots.txt compliance -- Cloudflare documented this in detail (more on that below). Third, the ChatGPT-User bot represents real-time browsing initiated by ChatGPT users asking questions about your site, which is why its volume is so much higher than batch training crawlers.

How Much AI Crawlers Actually Visit

AI crawler volume in 2026 is significant and growing. websearchapi.ai's March 2026 monthly report shows the relative share of all crawler traffic across analyzed sites.

Monthly Crawler Traffic Share (March 2026)

Share of all bot traffic across analyzed sites (websearchapi.ai monthly report)

Googlebot still leads at 31.6%, but the shift is dramatic when you combine AI-related crawlers -- GPTBot ( 12.0%), ClaudeBot ( 11.7%), Meta-ExternalAgent ( 16.7%), and PerplexityBot ( 3.15%) together represent 43.5% of crawler traffic. AI bots aren't a niche anymore; they are nearly half of everything visiting your site.

The volume per crawler matters too. Search Engine Journal's analysis of 24 million proxy requests across 69 customer websites found that ChatGPT-User made 3.6x more requests than Googlebot, with a 99.99% success rate. The opportunity (or risk) of AI visibility is now larger than the Google indexing surface most sites still optimize for.

What ChatGPT Actually Fetches

Aggregate volume tells you how much. Composition tells you what. Vercel's analysis of nextjs.org and customer sites broke down what each AI crawler actually requests when it visits a page. ChatGPT's pattern is almost entirely HTML-focused.

What ChatGPT Fetches

Composition of ChatGPT crawler requests by file type (Vercel research)

57.70% of ChatGPT's requests are HTML pages. 11.50% are JavaScript files -- but here's the catch: Vercel confirmed that ChatGPT does not execute these JS files even when it fetches them. They get downloaded but not run. The remaining ~31% covers images, CSS, JSON, and miscellaneous assets.

What does this mean in practice? ChatGPT prioritizes reading the literal text content of your HTML pages. It does not see your React-rendered components. It does not interpret your interactive widgets. If your pricing data is loaded via fetch() after page load, ChatGPT misses it entirely.

What Claude Actually Fetches

Claude's pattern is dramatically different from ChatGPT's -- which has implications for how you optimize for each platform.

What Claude Fetches

Composition of ClaudeBot requests by file type (Vercel research)

ClaudeBot focuses heavily on images -- 35.17% of its total fetches per Vercel's data. Its JavaScript fetch rate is also higher at 23.84%, though still without execution. The remaining ~41% includes HTML, CSS, and other assets.

The image priority is striking. Anthropic appears to be actively building visual understanding into its retrieval -- which means alt text, image filenames, surrounding captions, and image schema markup all matter more for Claude visibility than for ChatGPT visibility. For more on platform-specific optimization tactics, see Ranqo's platform-specific playbook.

ChatGPT reads. Claude looks. Optimizing for one without considering the other leaves citation surface on the table.

The JavaScript Blind Spot

This is the single most important technical fact in AI visibility: an analysis of 500 million+ GPTBot fetches found zero evidence of JavaScript execution. Vercel's independent research confirmed the same finding: "none of the major AI crawlers currently render JavaScript."

The implication: every line of content that requires JavaScript to appear on your page is invisible to AI. A typical React or Vue single-page app, served without server-side rendering, is a blank shell to GPTBot, ClaudeBot, and PerplexityBot. The page renders perfectly for human visitors. For AI, it's an empty <div id="root"></div>.

The exception is Google-Extended, which inherits Googlebot's rendering infrastructure and can process JavaScript. This means a React SPA may rank well on Google and appear in Gemini results while being completely invisible to ChatGPT and Claude -- the platforms that, per the April 2026 First Page Sage market share report, account for roughly 65% of AI chatbot usage.

A React SPA without SSR ranks on Google and shows up in Gemini. But it is a blank page to ChatGPT, Claude, and Perplexity. Three of the five major platforms see nothing.

Live Walkthrough: Same Page, Three Views

Imagine a page titled "Best Project Management Tools for 2026" with a 10-product comparison list, a feature-by-feature comparison table, an FAQ section, pricing details, and JSON-LD structured data. Here's what humans see vs what AI sees on the same page, depending on rendering strategy.

Live Walkthrough: Same Page, Three Views

What humans see vs what AI sees on a JavaScript SPA vs a server-rendered page

Page ElementHumans SeeAI Sees (SPA, no SSR)AI Sees (SSR)
H1 page titleVisible -- 'Best Project Management Tools for 2026'Empty -- HTML shell has no <h1> until JS runsVisible in raw HTML
Product list (10 items)10 product cards with prices, ratings, featuresEmpty <div id="root"></div>Full list of 10 products in semantic HTML
Comparison tableFeature-by-feature table with check marks and pricesNo table at all -- never renderedComplete <table> with all <tr> and <td> rows
FAQ section10 expandable Q&A itemsLoading skeleton or nothingFull questions and answers in HTML
Pricing details$29/mo with feature breakdownPlaceholder text or zero pricing dataPricing visible in <div> elements with semantic markup
Schema markup (JSON-LD)Invisible (in <head>)Possibly visible if injected before JSFully visible -- 3.2x AI Overview boost when present

The middle column tells the story. A SPA without SSR returns a near-empty HTML shell to AI crawlers. The H1, the product list, the comparison table, the FAQ, the pricing -- none of it is in the initial response. It appears only after JavaScript runs in a browser.

The right column shows the same content with server-side rendering or static generation. Now AI crawlers see exactly what humans see: full HTML with all content present immediately. The performance characteristics for users are often identical. The only difference is whether AI can read it.

For the comprehensive list of mistakes that hide content from AI -- including JavaScript-only rendering, paywalls, blocked crawlers, and missing schema -- see Ranqo's anti-GEO playbook.

The Stealth Crawling Problem

One AI platform breaks the rules. Cloudflare published research in August 2025 documenting that Perplexity uses stealth crawlers to evade robots.txt directives -- modifying its user-agent to impersonate Google Chrome on macOS, rotating its source ASNs, and crawling sites that have explicitly blocked PerplexityBot.

What this means in practice: blocking PerplexityBot in your robots.txt may not actually prevent Perplexity from accessing your site. The declared crawler honors the directive; an undeclared user-agent doesn't. From an analytics perspective, you may see "direct" or "unknown" bot traffic that is, in fact, Perplexity.

For sites that want Perplexity to cite them, this is arguably good news -- you don't need to do anything special to be visible. For sites that explicitly want to opt out, you may need to enforce blocks at the edge (Cloudflare, WAF rules) rather than via robots.txt alone.

How to See What AI Sees on Your Site

You don't need a specialized tool to test what AI crawlers see on your pages. The six methods below range from two-minute checks to full server-log analysis.

How to See What AI Sees: 6 Methods

Practical techniques to verify your site is visible to AI crawlers

MethodWhat It ShowsInterpretationDifficulty
View Source (Cmd/Ctrl + U)Raw HTML returned by your serverIf your content isn't here, AI cannot see itEasy
Disable JavaScript in browserApproximation of what GPTBot/ClaudeBot seesIf page is blank, you have a critical SPA problemEasy
curl with custom user-agentExactly what GPTBot or ClaudeBot receivesRun: curl -A 'GPTBot' yoursite.comMedium
Server access logsReal visits from AI crawlers (200/404 status)Check for GPTBot, ClaudeBot, PerplexityBot user agentsMedium
Schema validator (Google Rich Results)Whether structured data is parseableConfirm JSON-LD schema renders correctlyEasy
AI prompt: "Summarize [your URL]"Practical end-result of crawler visibilityIf response is generic or wrong, AI is missing contextEasy

The fastest test: open your page in a browser, press Cmd+U (or Ctrl+U), and look at the raw HTML. If the actual content of the page (headings, paragraphs, product info, prices) isn't in that source, AI crawlers can't see it either. The View Source pane is essentially what GPTBot, ClaudeBot, and PerplexityBot receive.

The most rigorous test: run curl -A "GPTBot" https://yoursite.com/page from a terminal. This sends a request with the GPTBot user-agent string and shows you exactly what the crawler receives. Compare the output to what the page actually displays in a browser. If they differ, you have a rendering gap.

The most useful long-term test: check your server access logs for AI crawler user-agents. Look for GPTBot, ChatGPT-User, OAI-SearchBot, ClaudeBot, Claude-User, PerplexityBot, and Google-Extended in the user-agent column. The HTTP status codes (200/404/etc.) tell you whether crawlers are succeeding. For the technical audit framework that maps to all of this, see Ranqo's AI readiness audit guide.

What to Do About It

Five concrete actions, ordered by impact:

1. Enable server-side rendering (SSR) or static generation

If you use React, Vue, or Angular, ensure the framework is configured to render content on the server (Next.js, Nuxt, Remix, SvelteKit) or generate static pages at build time. Verify with View Source -- your content should be in the raw HTML.

2. Audit your robots.txt for accidental blocks

Check explicitly for User-agent: GPTBot, ClaudeBot, ChatGPT-User, OAI-SearchBot, PerplexityBot, and Google-Extended. Make sure you're not blocking the bots you want crawling you. The default for most CMS platforms is fine, but custom robots.txt files often contain legacy blocks.

3. Add structured data (JSON-LD)

Per Frase's data, FAQ schema makes pages 3.2x more likely to appear in AI Overviews -- yet only 12.4% of websites use any structured data. JSON-LD goes in your raw HTML, so AI crawlers see it even without JavaScript execution.

4. Make critical content visible without JavaScript

Pricing, product specs, comparison tables, and FAQ content should all be present in the initial HTML response, not loaded asynchronously via fetch() after page load. This is the single biggest visibility lever for SaaS and e-commerce sites.

5. Optimize content for what each crawler prioritizes

ChatGPT prioritizes HTML text. Claude prioritizes images. Each platform has distinct preferences. For full platform-specific optimization, see Ranqo's 5 factors that drive AI citations and the 7-step content optimization playbook.

The gap between what humans see and what AI sees is the gap between your traffic and your AI citations. Close that gap first; everything else is downstream.

Audit your site for what AI actually sees

Run a 6-dimension AI readiness audit on any page: crawlability, content quality, page speed, AI extractability, citation potential, and authority. The audit identifies JavaScript rendering issues, schema gaps, and crawler access problems automatically. For the deeper context, see why Google rankings don't transfer to AI and the complete llms.txt guide.

Run your audit

Written by

Nisha Kumari

Co-Founder at Ranqo

Nisha Kumari is Co-Founder at Ranqo, where she leads growth strategy and client acquisition. With a background in digital marketing and financial management, she specializes in SEO, Generative Engine Optimization, and helping brands build visibility across AI platforms.

On this page

Share this article

[]
Ranqo

Monitor and improve your brand's visibility across ChatGPT, Claude, Perplexity, Gemini, and Grok.

Product

  • Search Visibility
  • Prompt Intelligence
  • Competitor Benchmarking
  • Source Analytics
  • Page Optimization
  • Content Lab
  • Action Center

Company

  • Pricing
  • Contact

Legal

  • Privacy
  • Terms
  • Cookies

Resources

  • Blog
  • AI Visibility Checker
  • AI Readiness Score

© 2026 Ranqo. All rights reserved.