Skip to main content
[]
Ranqo
Pricing
Guide

The AI Traffic Funnel: Training, Indexing, Agentic, Visit — and What GA4 Misses

GA4 collapses every AI bot visit and every AI-referred human into one rolled-up percentage. That number hides four different intents on four different timescales. This deep dive breaks down the Training → Indexing → Agentic → Visit funnel per AI platform, why client-side analytics misses most of it, and the server-side architecture that doesn't.

Nisha Kumari|May 15, 202622 min read

On this page

Most teams measure “AI traffic” with the client-side analytics they already have — Google Analytics 4, Mixpanel, Plausible. The tool collapses every AI bot visit and every AI-referred human into one rolled-up percentage, usually a small one. That single number is hiding four completely different things. Cloudflare Radar makes the variety obvious: per-vendor crawl-to-referral ratios (pages crawled per 1 human sent back) at Google Search 5.4 : 1, PerplexityBot 194.8 : 1, GPTBot 1,091 : 1, and Anthropic 38,065 : 1 (July 2025). Same metric, same network, same period — three orders of magnitude apart. That spread isn't a leaderboard; it's the first clue that the rolled-up “AI traffic” number on your dashboard is hiding far more than it shows.

Each vendor runs a different bot mix — bulk training, live-search indexing, agentic in-conversation fetches — through different product surfaces. No single per-vendor ratio can describe what any of these bots is actually doing on your site.

Pick any of those rolled-up numbers and the bot traffic inside it isn't one activity — it's four. Training, indexing, agentic, and human-visit traffic run in parallel per vendor, each on a different timescale and each tied to a different business value. Every analytics dashboard collapses all of that into one number. We unpack each stage with its examples and business value below.

And most analytics stacks let you see none of the four. Google Analytics 4 doesn't see the bot requests at all — bots don't execute the JavaScript tag GA4 is built around. On the human side, cookie-consent studies show 50% to over 60% of users reject cookies when “Reject all” is clearly visible — so GA4 doesn't reliably see the referred humans either. The visits happened. Your server logs prove it. The analytics stack you already pay for treats them as if they never existed.

The single number “AI traffic” is the wrong unit. AI bots make three different kinds of requests — training, indexing, agentic — that have wildly different business value. Server-side capture per AI platform is the only architecture that separates them.

That's why we built Visitor Analytics. Drop one middleware file into your Next.js, Express, Cloudflare Worker, or any backend stack — cURL works too, anything that can make an outbound HTTP call — and every single request to your site gets forwarded to Ranqo's intake endpoint server-side, before any of the failure modes above can fire. We classify each visit's User-Agent against a 35+ AI bot catalog, cross-reference the source IP against the vendor's published range to catch spoofers, and reconstruct a four-stage Training → Indexing → Agentic → Human Visit funnel per AI platform. First bot visit logged within minutes; no JS bundle, no cookie banner, no ad-blocker risk.

The rest of this post is what the data looks like in practice — what each of the four stages actually represents, why GA4 can't see them, what we surface in the dashboard, and the verified Cloudflare, Imperva, Similarweb, and Microsoft Clarity numbers behind the thesis. Every number in the dashboard screenshots below is one real-feeling sample brand; every external statistic traces to a published source.

Crawl-to-referral ratios — pages crawled per 1 human sent

Cloudflare Radar (latest figures, July 2025), log scale. Note the apples-to-oranges caveat: Anthropic is reported as an aggregate across all of its crawlers, while GPTBot is OpenAI's training bot in isolation. The spread reflects each vendor's different bot mix and product strategy, not a leaderboard — the same vendor would show different ratios per bot if the table broke each one out individually.

Why client-side analytics misses both sides

The standard analytics tag — Google Analytics 4, Plausible loaded client-side, Mixpanel, Heap — is a JavaScript snippet that fires when a real browser parses your HTML and executes its scripts. That works fine for one specific case: a logged-in human who accepts cookies and isn't running an ad blocker. Every other visitor type falls off.

The AI bots are the obvious case. GPTBot, ClaudeBot, PerplexityBot, GeminiBot, Google-Extended, OAI-SearchBot — none of them execute JavaScript. They issue an HTTP GET to your server, read the HTML response, and move on. From GA4's perspective, the visit literally never happened. Cloudflare's 2025 Year in Review puts AI bots at 4.2% of HTML requests across its network — but Imperva's 2025 Bad Bot Report puts total automated traffic at 51% of the open web, with bad bots alone at 37%. That's the first time in a decade automated traffic has surpassed humans.

The less-obvious case is the humans. The same consent studies that show 50% to over 60% cookie rejection also report that websites lose 40-70% of their tracking data points to consent rejection alone, before you even add ad-blocker losses on top. Both losses compound on the same visitor: a human running an ad blocker who also rejects the consent banner is twice-invisible to GA4. Server-side intake is invisible to none of them because the request already reached your server before either gate fired.

What each tool sees, by visitor type

100 = full visibility. Client-side JS analytics requires a rendered browser, consent, and no ad-block. Server-side intake only requires the request to reach your server. Visualisation by Ranqo; visitor-loss components anchored on Ignite (50% to 60%+ reject rate), Imperva (51% automated traffic), and Cloudflare (AI bots don't run JS).

A practical example we see in customer accounts every week: a mid-market SaaS site checks GA4, sees a Tuesday with ~700 sessions, and concludes traffic is flat. Their server access log for that same Tuesday shows ~12,000 HTML requests, of which roughly half are AI bots and search crawlers. GA4 isn't broken. It's working as designed for the subset of requests that completed the client-side handshake. Everything outside that subset — bots that don't run JS, humans who blocked the JS, humans whose consent denied the JS — exists, but exists somewhere GA4 cannot reach.

AI traffic isn't one thing — four stages, four intents

Every “AI traffic” article we read while researching this post treats AI bot visits as a monolith. Some report the share of all traffic that's “AI” (Cloudflare's 4.2%). Some report per-vendor share (GPTBot, ClaudeBot, etc.). None of them distinguish the intent behind a given bot request — and the intent is the entire game.

Inside a single vendor, the same brand name produces requests across multiple distinct bots, each tied to a different purpose:

  • Training Crawl. The bot is gathering pages to add to the next version of the model's pretraining set. The visit produces zero short-term traffic and zero short-term citation impact. Its value is future: showing up at all in a future training run. Examples: GPTBot, ClaudeBot, Google-Extended, Applebot-Extended.
  • Indexing Fetch. The bot is building or refreshing a live search index that the AI consults at answer time. This is where the live-search grounding mode of ChatGPT, Gemini AI Mode, and Perplexity actually retrieves your page. Examples: OAI-SearchBot, GeminiBot, PerplexityBot.
  • Agentic Fetch. The bot is fetching your page during a live user conversation, on behalf of one specific user who just asked the AI something. This is the highest- intent signal in the AI stack — an answer is being composed right now that may cite you. Examples: ChatGPT-User, Perplexity-User, Claude-User.
  • Human Visit. A human clicks through from the AI's answer to your actual site. This is the only stage your standard analytics can even partially see — and it's the most-discussed but smallest number in the funnel.
Training is a future-value crawl. Indexing is a present-search crawl. Agentic is a now-this-second crawl. Human Visit is the conversion. Most analytics conflate all four; the customer decisions for each are completely different.

The four stages aren't strictly sequential for any single visitor — they're parallel pipelines per vendor that are related but independent. But across a 30-day window for a single brand, you can reconstruct the funnel shape, and what that shape tells you about each AI platform is dramatically more useful than the single rolled-up “AI traffic” number. Here's how we expose it in the dashboard for one sample brand:

Funnel by AI platform

Crawl → Visit

Training Crawl → Indexing → Agentic Fetch → Human Visit

5,958 training·7,396 indexing·929 agentic·2,912 visits
Platform
Training
Indexing
Agentic
Visits
Gemini
Citing + driving traffic
1,240
3,247
124
387
ChatGPT
Citing + driving traffic
2,847
2,103
412
1,247
Perplexity
Citing + driving traffic
142
1,824
187
542
Claude
Citing + driving traffic
1,512
158
198
712
Meta AI
Driving early traffic
217
64
8
24
Connect a single crawl to the human visit it producedUpdated every 30s

Five AI platforms, four stages each. Notice the structural differences between vendors that the rolled-up number would hide: Gemini does 3,247 indexing fetches but only 124 agentic ones — Google indexes everything but only invokes live retrieval for AI Mode sessions. Perplexity does minimal training crawling (142 fetches, less than one-tenth of its indexing volume) but 187 agentic fetches against 1,824 indexing — the lowest training-to-indexing ratio of the group because Perplexity is architecturally a citation engine, as we covered in our Perplexity playbook. Claude is overwhelmingly training-plus-agentic with only early indexing activity (158 fetches from Anthropic's experimental search crawler) but strong downstream conversion — 198 agentic fetches and 712 referred humans. Meta AI is still in training-heavy early- stage mode for most accounts. Each story is different. Each story matters.

Stage 1: Training crawls

Training crawls are how your content enters an AI model's parametric memory — the body of knowledge the model has internalised and can answer from without any live retrieval. The two biggest examples are GPTBot (OpenAI) and ClaudeBot (Anthropic). Google's training crawler is Google-Extended, which piggybacks on Googlebot infrastructure but signals a separate purpose.

Training crawl volume has scaled to search-engine-grade numbers in the last 12 months — Vercel measured GPTBot at roughly 13% of Googlebot's monthly request volume by late 2024, and the combined AI crawler footprint at nearly 28% of Googlebot's monthly total. The per-vendor growth curves are covered in detail in our AI crawler control guide; for this post the relevant point is just that the volume is big and worth bucketing accurately.

AI crawler year-over-year request growth

Cloudflare network-wide data (May 2024 → May 2025). PerplexityBot scaled from a near-zero base; GPTBot rose from the #9 crawler to the #3 spot; ClaudeBot is the only major AI crawler that shrank. Capped visually at +1,000% — the actual PerplexityBot growth is +157,490%.

For Visitor Analytics, every training crawl gets bucketed into the Training intent column. The User-Agent makes the classification straightforward — GPTBot and ClaudeBot are both well-documented and openly declare themselves — but we still cross-reference the source IP against the vendor's published range. (More on why User-Agent alone isn't enough in the verification section.) The right policy decision here is rarely “block GPTBot forever.“ It's more often ”let GPTBot train on the marketing surface; gate the product docs behind auth.“ Whichever way you go, you can only make the decision if you can see the actual training-crawl volume — which is exactly what this column shows.

If you want to dig into the policy side of training-crawl allow-vs-block, we covered that in detail in our AI crawler control guide — robots.txt, llms.txt, and AI.txt handle different layers of the decision. Visitor Analytics is the measurement layer; that guide is the policy layer.

Stage 2: Indexing fetches

Indexing fetches are how AI platforms maintain a live search index that the model consults at answer time. This is the critical layer for any AI that has a “search-grounded” or “browse” mode — ChatGPT with the search toggle on, Gemini in AI Mode, Perplexity end-to-end. The retrieval crawlers here are OAI-SearchBot, GeminiBot, and PerplexityBot. They scrape your pages, build a vector or keyword index, and surface chunks of your content as cited material when a user asks a question your page answers.

Why this stage matters more than training for most B2B marketing teams: a page that's in the live index has a chance of being cited today. A page that's only in the training corpus might surface a year from now, after the next model retraining run, with no guaranteed attribution. The distinction maps cleanly onto OpenAI's architecture, which we broke down in our ChatGPT citation playbook: parametric ChatGPT answers from memory (training-fed); search-grounded ChatGPT answers from the live index (indexing-fed). Different content strategies optimise for different stages.

The Visitor Analytics dashboard separates these — for the sample brand above, ChatGPT shows 2,847 training fetches and 2,103 indexing fetches in the same 30-day window. Two different mechanisms, two different optimisations, one rolled-up “ChatGPT bot” number if you only had User-Agent classification without intent attribution.

Stage 3: Agentic fetches — the highest-intent signal

Agentic fetches are the most interesting category in the whole taxonomy, and the one most analytics tools don't even acknowledge as distinct. When a user asks ChatGPT a question and the model decides it needs to fetch a live page to answer, the bot that fires is ChatGPT-User — not GPTBot, not OAI-SearchBot. The User-Agent literally encodes the distinction: this request is happening on behalf of one specific human, right now, in a live session, because that human's question can't be answered from training or index alone.

Translated to business terms: an agentic fetch is roughly one request away from a potential citation. The AI is composing the answer that's about to surface your URL. Whether the human then clicks through or just reads the answer in-flow, the agentic fetch itself is the leading indicator. ChatGPT-User, Perplexity-User, and Claude-User are the bots to watch.

A training crawl is “you might be cited next year.” An indexing fetch is “you might be cited next week.” An agentic fetch is “the answer that may cite you is being typed right now.“

Agentic fetches are a small fraction of total AI bot volume — in the sample brand, just 929 agentic fetches against 5,958 training and 7,396 indexing — but they're the leading indicator the others aren't. A spike in ChatGPT-User on the same day a competitor's launch hits the news is a stronger signal than a thousand new GPTBot crawls. We expose them as their own column for exactly that reason.

Stage 4: Human visits — closing the loop

AI referral visits — humans who clicked through from ChatGPT, Claude, Perplexity, Gemini, or another AI surface to land on your site — are the smallest part of the AI funnel and the easiest to measure. They are also growing fast. Similarweb measured 1.13 billion AI referral visits in June 2025 alone — up +357% year-over- year — with ChatGPT accounting for north of 80% of the share.

The reason AI referrals matter disproportionately to their volume is conversion quality. Digiday reported AI referral traffic converting to sign-ups at 1.66% — versus 0.15% for organic search, 0.13% for direct, and 0.46% for paid social. That's roughly 11× the organic- search conversion rate. A visitor who arrives via “ChatGPT said you should look at this“ has already passed through a consultation step that organic search visitors haven't.

Sign-up conversion rate by traffic source

Digiday (2025) — share of incoming visits that complete a sign-up. AI referrals are still a small share of overall traffic, but the visitors who do click through convert at ~10× the rate of organic search.

AI referrals are also a partial substitute for the organic search collapse smaller publishers are now reporting. Chartbeat data shows small publishers (1K-10K daily page views) losing 60% of their search referral traffic over a two-year window. Medium publishers lost 47%. Large publishers lost 22%. The arithmetic asymmetry between organic-traffic loss and AI-referral gain is brutal at the moment — most sites lose 5× more organic clicks than they gain AI clicks — but the AI clicks they do gain convert ~11× better, so the revenue picture isn't as bad as the visit picture. Either way, you need to measure both halves to know which trade you're making. As we covered in our AI Visibility vs SEO pillar, the era of measuring visibility through one channel is over.

Visitor Analytics attributes every AI referral visit by the referring AI platform. The Referer header from chat.openai.com maps to ChatGPT; claude.ai maps to Claude; perplexity.ai to Perplexity; and so on across the catalog. The visit row in the dashboard carries both the channel (Human · AI Referral) and the specific platform — so a ChatGPT-cited surge looks different from a Perplexity-cited surge, even though both would just collect under “Direct” or “(other)” in GA4 (cross-origin Referer headers are routinely stripped by browsers in HTTPS-to-HTTPS navigation when there's no explicit referrer policy).

Verifying bot identity: User-Agent alone isn't enough

Every bot we've named so far has a well-documented User-Agent string. That makes them easy to identify — and easy to impersonate. Cloudflare's own engineering team is on record stating that “user agent headers alone are easily spoofed and are therefore insufficient for reliable identification,” and that IP-range logic is “brittle” because crawler IPs change over time and may be shared across products. They're proposing cryptographic verification (HTTP Message Signatures, RFC 9421) as the long-term fix, but that requires adoption by every vendor and isn't standard practice in 2026.

In practice, the strong signal today is the combination of User-Agent + source IP cross-referenced against the vendor's published IP range. OpenAI publishes the range used by GPTBot, ChatGPT-User, and OAI-SearchBot. Anthropic publishes the range used by ClaudeBot. Google has long published its crawler ranges. A request that claims to be GPTBot from an IP outside OpenAI's published range is, by definition, not GPTBot — it might be a security scanner with a copy-pasted User-Agent, a competitive analytics tool, or a bad actor probing your robots.txt for blocked paths.

That's the column the Visitor Analytics dashboard surfaces as “Verified” — the share of each bot's visits whose source IP matched the vendor's published range. In the sample brand the numbers look like:

Agents detected

IP-range verified

Per-bot breakdown with IP-range verification · last 30 days · 35+ AI bots tracked

Agent
Category
Visits
Verified
Last seen
GeminiBot
Google
Indexing
3,247
99.1%
4m ago
GPTBot
OpenAI
Training
2,847
98.4%
12m ago
PerplexityBot
Perplexity
Indexing
1,824
91.7%
23m ago
ClaudeBot
Anthropic
Training
1,512
94.2%
41m ago
ChatGPT-User
OpenAI
Agentic
412
100.0%
2m ago
User-Agent matched · source IP cross-referencedAnti-spoofing on

GPTBot at 98.4% verified is healthy. ChatGPT-User at 100% is expected — agentic fetches are routed through OpenAI's published range with no exceptions. PerplexityBot at 91.7% is the marginally-interesting number: a small share of “PerplexityBot” traffic actually came from IPs Perplexity doesn't publish, which is consistent with Cloudflare's August 2025 finding that Perplexity at times runs stealth crawlers from undeclared IPs (a separate story we covered in the crawler control guide). Whatever the cause, you want this number above 90% for the big four AI vendors; consistent low-verification numbers are a signal something is impersonating you, not a signal that something is broken in your install.

How we built Visitor Analytics

The architecture is deliberately boring on the customer side. You drop one middleware file into your stack, set an environment variable, and you're done. The middleware fires a fire-and-forget GET to Ranqo's intake endpoint with the request URL, User-Agent, Referer, and client IP — and then immediately calls next(). Zero added latency on your response path. The Ranqo side does the rate-limiting, classification, IP-range verification, and roll-up.

The Next.js install looks like this (we ship similar snippets for Express, Hono, Fastify, Koa, Cloudflare Workers, Django, and raw cURL):

middleware.ts
import { NextRequest, NextResponse } from 'next/server';

export function middleware(req: NextRequest) {
  // Skip non-document fetches: prefetch, RSC, HEAD, OPTIONS
  if (req.method !== 'GET') return NextResponse.next();
  if (req.headers.get('next-router-prefetch')) return NextResponse.next();
  if (req.headers.get('rsc') === '1') return NextResponse.next();
  const purpose = req.headers.get('sec-purpose');
  if (purpose && purpose.startsWith('prefetch')) return NextResponse.next();

  const params = new URLSearchParams({
    url:        req.url,
    userAgent:  req.headers.get('user-agent')      ?? '',
    ref:        req.headers.get('referer')         ?? '',
    ip:         req.headers.get('x-forwarded-for') ?? '',
    websiteKey: process.env.RANQO_SITE_KEY!,
  });

  // Fire-and-forget — never await, never block the response
  fetch(`https://app.ranqo.ai/api/v1/intake/pageview?${params}`)
    .catch(() => {});

  return NextResponse.next();
}

Three things in the snippet are worth calling out because they are the failure modes other server-side analytics integrations ship broken:

  • Prefetch / RSC filtering. In a Next.js App Router site, every <Link> hover triggers a prefetch request that has the same URL as a real page view but isn't one. If you forget the next-router-prefetch and rsc=1 filters, your visit counts inflate 5-10× and become useless. We learned this in our own dashboard before we shipped Site Tracking externally — the first week's data was almost entirely fake.
  • Fire-and-forget. The fetch(...).catch(() => {}) pattern means no matter how slow Ranqo's intake endpoint is (target p95 < 50ms), or even if it's down entirely, the customer's request path isn't affected. We always return 200 from the intake endpoint regardless of whether the payload was valid — invalid keys and rate-limited requests get {"ok": false} but never an error status. There is no retry pressure on the customer side because there is no failure surface.
  • Real client IP. Server-side middleware has access to x-forwarded-for (or cf-connecting-ip on Cloudflare), which is the real client IP that pierces CDNs and proxies. That IP is the one we cross-reference against published vendor ranges for verification. A client-side JS tag could never have access to this without a server round-trip — by which point the bot has already left and the data isn't recoverable.

Behind the intake endpoint, the data path is HTTP GET → Postgres row within ~30 seconds (Inngest worker pulls off the queue, classifies the User-Agent against the 35+ AI bot catalog, looks up the IP range, writes the row). The dashboard polls the Postgres rollup and surfaces the KPIs, agents table, funnel, and live feed you've seen throughout this post.

What this looks like in the dashboard

The dashboard is one page under /b/[brand]/ai-traffic. Top of the page is the KPI strip you've now seen pieces of — Human Visits · Agent Visits · Verified Bots · Pages Crawled over the selected window (default last 30 days):

Site Tracking

Server-side capture of humans and AI bots — every visit, regardless of JS

Last 30 days
Human Visits
12,847
across 5 channels
Agent Visits
14,283
35+ AI bots tracked
Verified Bots
67%
by published IP range
Pages Crawled
89
unique paths hit

The KPIs roll up across all six dashboard cards. Underneath: the channel-level breakdown of where humans came from (Direct / Search / Social / AI Platforms / Email / Other), the AI-platform-specific bar list (Claude leads → ChatGPT → Perplexity → Gemini → Grok → Copilot → DeepSeek → Meta AI), the Agents Detected table you saw earlier, the Funnel by AI Platform table, the Top Crawled Paths bar (which URLs the bots actually hit), and the Live Feed.

The Live Feed is the surface most customers leave open as a monitor — every visit streams in within a few minutes, tagged by intent, country, and verification status:

Live Feed

Auto-refreshes every 30s

Most recent visits across your site — bots, humans, country, intent

Chrome→/blog/ai-visibility-new-seo-2026USHuman · AI Referralnow
ClaudeBot→/blog/how-to-get-cited-by-chatgptAI bot8s
ChatGPT-User→/pricingUSAgentic19s
Chrome→/signupDEHuman · AI Referral32s
PerplexityBot→/sitemap.xmlAI bot47s
Server-side · No client JS · No cookies35+ AI bots tracked

What we deliberately don't ship in Visitor Analytics: a closed-loop attribution model that ties an AI referral visit to a downstream conversion (a paid sign-up, a contract close). That kind of attribution requires identity stitching across your authenticated session, your CRM, and your billing system, and we don't have access to those. As we discussed in our broader piece on what AI crawlers see, the right place to wire closed-loop attribution is your own analytics stack — Mixpanel, Segment, or a data warehouse — which can join Ranqo's AI-referral channel with your conversion events. The honest framing: Ranqo measures the top of the funnel (the AI side of the visit); you join it to your bottom-of-funnel (the conversion side) in whichever system already owns your conversion data.

The honest summary

We didn't build Visitor Analytics because GA4 is broken. GA4 is the right tool for measuring an authenticated, consented, JS-enabled human session — it's just that that subset is getting smaller (bots 51% of traffic, AI bots 4.2% and growing, 50%+ of humans rejecting cookies), and the four-stage funnel that lives outside it is where the AI-citation decisions actually get made. Attributing every bot request to one of four intents — Training, Indexing, Agentic, Visit — per AI platform, verified against the vendor's published IP range, is the decision unit a flat “AI bots = X% of traffic” number doesn't give you.

4 stages

Training → Indexing → Agentic → Human Visit, attributed per AI platform. Most analytics show only the last one (and even that unreliably). The four-stage funnel makes every bot request visible against the citation cycle it actually belongs to.

Closed-loop conversion attribution still lives in your own analytics stack. Cryptographic bot verification still depends on vendor adoption of HTTP Message Signatures, which isn't there yet. Stealth crawlers will keep evading detection. The Visitor Analytics product is honest about all three. What it does well is the per-bot-per-stage attribution that's structurally unavailable anywhere else. If you want to see your AI traffic the way it actually moves — not the way a rolled-up number describes it — install the middleware and come look at your funnel.

See your AI traffic, before your next deploy

Drop the middleware into your Next.js, Express, or Cloudflare Worker setup. First bot visit logged within minutes — classified by intent, verified by IP range, and reconciled into the four-stage funnel above. No JS bundle, no cookie banner, no ad-blocker risk. If you want to see what AI platforms actually do with your pages once they crawl them, that's covered in the companion guide.

Start free trial

Written by

Nisha Kumari

Co-Founder at Ranqo

Nisha Kumari is Co-Founder at Ranqo, where she leads growth strategy and client acquisition. With a background in digital marketing and financial management, she specializes in SEO, Generative Engine Optimization, and helping brands build visibility across AI platforms.

On this page

Share this article

Related articles

Guide

AI.txt vs Llms.txt vs Robots.txt: The Complete AI Crawler Control Guide for 2026

Most articles about AI crawlers ask the wrong question -- whether to block them. The strategic question is which crawlers should train your model, which should retrieve from you for citation, and where licensing replaces both. This guide covers the three control files (robots.txt, llms.txt, AI.txt), the AI crawler taxonomy, the crawl-to-referral economics that should drive your decisions (ClaudeBot 20,583:1 vs PerplexityBot 194.8:1), the Perplexity stealth-crawling case study, and industry-specific decision frameworks. Every claim is verified against published sources.

May 4, 202621 min read
Guide

What AI Actually Sees When It Crawls Your Site: A Live Walkthrough

Your site looks great in a browser. But AI crawlers see only raw HTML -- no JavaScript, no rendered components, no dynamic content. This is a live walkthrough of exactly what GPTBot, ClaudeBot, and PerplexityBot fetch when they visit, with verified data on every claim and a 6-method test you can run today.

Apr 26, 202612 min read
Guide

llms.txt: The Complete Guide to the New Standard for AI Crawlers

llms.txt is a proposed web standard that lets you publish a curated map of your site for large language models. 10.13% of domains have already adopted it -- but does it actually move AI citations? This guide covers the spec, the data, the major adopters, and an honest answer on whether to implement.

Apr 25, 202610 min read
[]
Ranqo

Ranqo is the AI visibility platform that helps brands track, analyze, and improve their presence across ChatGPT, Claude, Perplexity, Gemini & Grok.

Product

  • Search Visibility
  • Prompt Intelligence
  • Competitor Benchmarking
  • Source Analytics
  • Page Optimization
  • Content Lab
  • Action Center
  • Visitor Analytics

Company

  • About
  • Pricing
  • Book a demo
  • Contact

Legal

  • Privacy
  • Terms
  • Cookies

Resources

  • Blog
  • Research
  • Compare
  • All Free Tools
  • AI Visibility Checker
  • AI Readiness Score
  • AI Content Grader
  • AI Crawler Inspector
  • LLMs.txt Generator
  • Robots.txt Generator

© 2026 Ranqo. All rights reserved.