Guide

The AI Traffic Funnel: Training, Indexing, Agentic, Visit — and What GA4 Misses

"AI traffic" is a single number on most dashboards. It hides four very different things — training crawls, live-search indexing, agentic in-conversation fetches, and the human visits at the bottom of the funnel — each with completely different business value. Cloudflare's per-vendor crawl-to-referral ratios span three orders of magnitude (Google 5.4:1, PerplexityBot 194.8:1, GPTBot 1,091:1, Anthropic 38,065:1), not because some vendors are wasteful, but because each runs a different bot mix. This is the architecture we just shipped to measure all four stages per AI platform — server-side intake, IP-range bot verification, crawl-to-visit funnel attribution — and what the data looks like in practice, anchored on verified Cloudflare, Imperva, Similarweb, and Microsoft Clarity numbers.

Nisha Kumari|May 15, 202621 min read

Most teams measure “AI traffic” with the client-side analytics they already have — Google Analytics 4, Mixpanel, Plausible. The tool collapses every AI bot visit and every AI-referred human into a single rolled-up percentage of total traffic, usually a small one. That single number is hiding four completely different things. Cloudflare Radar publishes per-vendor crawl-to-referral ratios (pages crawled per 1 human sent back) that hint at the variety sitting underneath the rollup. In the July 2025 table: Google Search 5.4 : 1, PerplexityBot 194.8 : 1, GPTBot 1,091 : 1, Anthropic (aggregate across all its crawlers) 38,065 : 1, down from 286,930 : 1 in January — a 7.5× move in six months. Same metric, same period, same network. Three orders of magnitude apart. That spread isn't a leaderboard; it's the first clue that the rolled-up “AI traffic” number on your own dashboard is hiding far more than it shows.

A few things are happening underneath those ratios. Each vendor splits its bot fleet across different activities — bulk training-data collection, live-search indexing, agentic in-conversation fetches — at completely different ratios, and runs them through completely different product surfaces. Cloudflare's table also isn't comparing apples to apples: it reports Anthropic as one aggregate across all of its crawlers, while GPTBot is just OpenAI's training bot specifically — OpenAI's other bots (OAI-SearchBot, ChatGPT-User) would likely pull an “OpenAI aggregate” substantially lower if it were reported. PerplexityBot in the table is the company's indexing crawler in isolation, which is different from Perplexity-User (agentic). Different vendors, different bot mixes, different product strategies. The takeaway isn't which vendor is “efficient” or “wasteful” — it's that no single per-vendor ratio can describe what any of these bots is actually doing on your site.

Pick any of those rolled-up numbers and the bot traffic inside it isn't one activity — it's four. Some requests are training-data collection (ClaudeBot, GPTBot, Google-Extended, Applebot-Extended) — value that shows up months from now in the next model release. Some are live-search indexing (OAI-SearchBot, GeminiBot, PerplexityBot, Claude-SearchBot) — value that shows up in tomorrow's answers. Some are agentic fetches (ChatGPT-User, Perplexity-User, Claude-User) happening during a real AI conversation about to cite you right now. And some are actual human visits referred from an AI surface after the answer was read. Four intents, four timescales, four completely different business values. These four pipelines run in parallel, not in sequence — a newer brand often gets agentic fetches and human visits with no training or indexing crawl in the same window, because ChatGPT-User and Claude-User can fetch a page live without any prior crawl. Every analytics dashboard collapses all of that into one number.

And most analytics stacks let you see none of the four. Google Analytics 4 doesn't see the bot requests at all — bots don't execute the JavaScript tag GA4 is built around. On the human side, cookie-consent studies show 50% to over 60% of users reject cookies when “Reject all” is clearly visible — so GA4 doesn't reliably see the referred humans either. The visits happened. Your server logs prove it. The analytics stack you already pay for treats them as if they never existed.

The single number “AI traffic” is the wrong unit. AI bots make three different kinds of requests — training, indexing, agentic — that have wildly different business value. Server-side capture per AI platform is the only architecture that separates them.

That's why we built Visitor Analytics. Drop one middleware file into your Next.js, Express, Cloudflare Worker, or any backend stack — cURL works too, anything that can make an outbound HTTP call — and every single request to your site gets forwarded to Ranqo's intake endpoint server-side, before any of the failure modes above can fire. We classify each visit's User-Agent against a 35+ AI bot catalog, cross-reference the source IP against the vendor's published range to catch spoofers, and reconstruct a four-stage Training → Indexing → Agentic → Human Visit funnel per AI platform. First bot visit logged within minutes; no JS bundle, no cookie banner, no ad-blocker risk.

The rest of this post is what the data looks like in practice — what each of the four stages actually represents, why GA4 can't see them, what we surface in the dashboard, and the verified Cloudflare, Imperva, Similarweb, and Microsoft Clarity numbers behind the thesis. Every number in the dashboard screenshots below is one real-feeling sample brand; every external statistic traces to a published source.

Crawl-to-referral ratios — pages crawled per 1 human sent

Cloudflare Radar (latest figures, July 2025), log scale. Note the apples-to-oranges caveat: Anthropic is reported as an aggregate across all of its crawlers, while GPTBot is OpenAI's training bot in isolation. The spread reflects each vendor's different bot mix and product strategy, not a leaderboard — the same vendor would show different ratios per bot if the table broke each one out individually.

AI traffic isn't one thing — four stages, four intents

Every “AI traffic” article we read while researching this post treats AI bot visits as a monolith. Some report the share of all traffic that's “AI” (Cloudflare's 4.2%). Some report per-vendor share (GPTBot, ClaudeBot, etc.). None of them distinguish the intent behind a given bot request — and the intent is the entire game.

Inside a single vendor, the same brand name produces requests across multiple distinct bots, each tied to a different purpose:

Training Crawl. The bot is gathering pages to add to the next version of the model's pretraining set. The visit produces zero short-term traffic and zero short-term citation impact. Its value is future: showing up at all in a future training run. Examples: GPTBot, ClaudeBot, Google-Extended, Applebot-Extended.
Indexing Fetch. The bot is building or refreshing a live search index that the AI consults at answer time. This is where the live-search grounding mode of ChatGPT, Gemini AI Mode, and Perplexity actually retrieves your page. Examples: OAI-SearchBot, GeminiBot, PerplexityBot.
Agentic Fetch. The bot is fetching your page during a live user conversation, on behalf of one specific user who just asked the AI something. This is the highest- intent signal in the AI stack — an answer is being composed right now that may cite you. Examples: ChatGPT-User, Perplexity-User, Claude-User.
Human Visit. A human clicks through from the AI's answer to your actual site. This is the only stage your standard analytics can even partially see — and it's the most-discussed but smallest number in the funnel.

Training is a future-value crawl. Indexing is a present-search crawl. Agentic is a now-this-second crawl. Human Visit is the conversion. Most analytics conflate all four; the customer decisions for each are completely different.

The four stages aren't strictly sequential for any single visitor — they're parallel pipelines per vendor that are related but independent. But across a 30-day window for a single brand, you can reconstruct the funnel shape, and what that shape tells you about each AI platform is dramatically more useful than the single rolled-up “AI traffic” number. Here's how we expose it in the dashboard for one sample brand:

Funnel by AI platform

Crawl → Visit

Training Crawl → Indexing → Agentic Fetch → Human Visit

5,958 training7,396 indexing929 agentic2,912 visits

Platform

Training

Indexing

Agentic

Visits

Gemini

Citing + driving traffic

1,240

3,247

124

387

ChatGPT

Citing + driving traffic

2,847

2,103

412

1,247

Perplexity

Citing + driving traffic

142

1,824

187

542

Claude

Citing + driving traffic

1,512

158

198

712

Meta AI

Driving early traffic

217

Connect a single crawl to the human visit it producedUpdated every 30s

Five AI platforms, four stages each. Notice the structural differences between vendors that the rolled-up number would hide: Gemini does 3,247 indexing fetches but only 124 agentic ones — Google indexes everything but only invokes live retrieval for AI Mode sessions. Perplexity does minimal training crawling (142 fetches, less than one-tenth of its indexing volume) but 187 agentic fetches against 1,824 indexing — the lowest training-to-indexing ratio of the group because Perplexity is architecturally a citation engine, as we covered in our Perplexity playbook. Claude is overwhelmingly training-plus-agentic with only early indexing activity (158 fetches from Anthropic's experimental search crawler) but strong downstream conversion — 198 agentic fetches and 712 referred humans. Meta AI is still in training-heavy early- stage mode for most accounts. Each story is different. Each story matters.

Stage 1: Training crawls

Training crawls are how your content enters an AI model's parametric memory — the body of knowledge the model has internalised and can answer from without any live retrieval. The two biggest examples are GPTBot (OpenAI) and ClaudeBot (Anthropic). Google's training crawler is Google-Extended, which piggybacks on Googlebot infrastructure but signals a separate purpose.

Training crawl volume has shifted dramatically in the last 12 months. Cloudflare measured GPTBot growing +305% year-over- year between May 2024 and May 2025 — its rank among all web crawlers jumped from #9 to #3 and its share rose from 2.2% to 7.7%. PerplexityBot scaled at the staggering +157,490% mark from a tiny base. ClaudeBot was the only major AI crawler to shrink — down −46% in request volume, share falling from 11.7% to 5.4%. Vercel reported GPTBot generating 569 million requests across its network in a single month (data window ending November 2024) — roughly 13% of Googlebot's 4.5 billion over the same period. The combined AI crawler footprint across Vercel (GPTBot, ClaudeBot, AppleBot, PerplexityBot) hit nearly 1.3 billion monthly requests — about 28% of Googlebot's monthly volume. The “GPTBot is becoming a real search-engine-scale crawler” story is now numerically literal.

AI crawler year-over-year request growth

Cloudflare network-wide data (May 2024 → May 2025). PerplexityBot scaled from a near-zero base; GPTBot rose from the #9 crawler to the #3 spot; ClaudeBot is the only major AI crawler that shrank. Capped visually at +1,000% — the actual PerplexityBot growth is +157,490%.

For Visitor Analytics, every training crawl gets bucketed into the Training intent column. The User-Agent makes the classification straightforward — GPTBot and ClaudeBot are both well-documented and openly declare themselves — but we still cross-reference the source IP against the vendor's published range. (More on why User-Agent alone isn't enough in the verification section.) The right policy decision here is rarely “block GPTBot forever.“ It's more often ”let GPTBot train on the marketing surface; gate the product docs behind auth.“ Whichever way you go, you can only make the decision if you can see the actual training-crawl volume — which is exactly what this column shows.

If you want to dig into the policy side of training-crawl allow-vs-block, we covered that in detail in our AI crawler control guide — robots.txt, llms.txt, and AI.txt handle different layers of the decision. Visitor Analytics is the measurement layer; that guide is the policy layer.

Stage 2: Indexing fetches

Indexing fetches are how AI platforms maintain a live search index that the model consults at answer time. This is the critical layer for any AI that has a “search-grounded” or “browse” mode — ChatGPT with the search toggle on, Gemini in AI Mode, Perplexity end-to-end. The retrieval crawlers here are OAI-SearchBot, GeminiBot, and PerplexityBot. They scrape your pages, build a vector or keyword index, and surface chunks of your content as cited material when a user asks a question your page answers.

Why this stage matters more than training for most B2B marketing teams: a page that's in the live index has a chance of being cited today. A page that's only in the training corpus might surface a year from now, after the next model retraining run, with no guaranteed attribution. The distinction maps cleanly onto OpenAI's architecture, which we broke down in our ChatGPT citation playbook: parametric ChatGPT answers from memory (training-fed); search-grounded ChatGPT answers from the live index (indexing-fed). Different content strategies optimise for different stages.

The Visitor Analytics dashboard separates these — for the sample brand above, ChatGPT shows 2,847 training fetches and 2,103 indexing fetches in the same 30-day window. Two different mechanisms, two different optimisations, one rolled-up “ChatGPT bot” number if you only had User-Agent classification without intent attribution.

Stage 3: Agentic fetches — the highest-intent signal

Agentic fetches are the most interesting category in the whole taxonomy, and the one most analytics tools don't even acknowledge as distinct. When a user asks ChatGPT a question and the model decides it needs to fetch a live page to answer, the bot that fires is ChatGPT-User — not GPTBot, not OAI-SearchBot. The User-Agent literally encodes the distinction: this request is happening on behalf of one specific human, right now, in a live session, because that human's question can't be answered from training or index alone.

Translated to business terms: an agentic fetch is roughly one request away from a potential citation. The AI is composing the answer that's about to surface your URL. Whether the human then clicks through or just reads the answer in-flow, the agentic fetch itself is the leading indicator. ChatGPT-User, Perplexity-User, and Claude-User are the bots to watch.

A training crawl is “you might be cited next year.” An indexing fetch is “you might be cited next week.” An agentic fetch is “the answer that may cite you is being typed right now.“

Agentic fetches are a small fraction of total AI bot volume — in the sample brand, just 929 agentic fetches against 5,958 training and 7,396 indexing — but they're the leading indicator the others aren't. A spike in ChatGPT-User on the same day a competitor's launch hits the news is a stronger signal than a thousand new GPTBot crawls. We expose them as their own column for exactly that reason.

Stage 4: Human visits — closing the loop

AI referral visits — humans who clicked through from ChatGPT, Claude, Perplexity, Gemini, or another AI surface to land on your site — are the smallest part of the AI funnel and the easiest to measure. They are also growing fast. Similarweb measured 1.13 billion AI referral visits in June 2025 alone — up +357% year-over- year — with ChatGPT accounting for north of 80% of the share.

The reason AI referrals matter disproportionately to their volume is conversion quality. Digiday reported AI referral traffic converting to sign-ups at 1.66% — versus 0.15% for organic search, 0.13% for direct, and 0.46% for paid social. That's roughly 11× the organic- search conversion rate. A visitor who arrives via “ChatGPT said you should look at this“ has already passed through a consultation step that organic search visitors haven't.

Sign-up conversion rate by traffic source

Digiday (2025) — share of incoming visits that complete a sign-up. AI referrals are still a small share of overall traffic, but the visitors who do click through convert at ~10× the rate of organic search.

AI referrals are also a partial substitute for the organic search collapse smaller publishers are now reporting. Chartbeat data shows small publishers (1K-10K daily page views) losing 60% of their search referral traffic over a two-year window. Medium publishers lost 47%. Large publishers lost 22%. The arithmetic asymmetry between organic-traffic loss and AI-referral gain is brutal at the moment — most sites lose 5× more organic clicks than they gain AI clicks — but the AI clicks they do gain convert ~11× better, so the revenue picture isn't as bad as the visit picture. Either way, you need to measure both halves to know which trade you're making. As we covered in our AI Visibility vs SEO pillar, the era of measuring visibility through one channel is over.

Visitor Analytics attributes every AI referral visit by the referring AI platform. The Referer header from chat.openai.com maps to ChatGPT; claude.ai maps to Claude; perplexity.ai to Perplexity; and so on across the catalog. The visit row in the dashboard carries both the channel (Human · AI Referral) and the specific platform — so a ChatGPT-cited surge looks different from a Perplexity-cited surge, even though both would just collect under “Direct” or “(other)” in GA4 (cross-origin Referer headers are routinely stripped by browsers in HTTPS-to-HTTPS navigation when there's no explicit referrer policy).

Verifying bot identity: User-Agent alone isn't enough

Every bot we've named so far has a well-documented User-Agent string. That makes them easy to identify — and easy to impersonate. Cloudflare's own engineering team is on record stating that “user agent headers alone are easily spoofed and are therefore insufficient for reliable identification,” and that IP-range logic is “brittle” because crawler IPs change over time and may be shared across products. They're proposing cryptographic verification (HTTP Message Signatures, RFC 9421) as the long-term fix, but that requires adoption by every vendor and isn't standard practice in 2026.

In practice, the strong signal today is the combination of User-Agent + source IP cross-referenced against the vendor's published IP range. OpenAI publishes the range used by GPTBot, ChatGPT-User, and OAI-SearchBot. Anthropic publishes the range used by ClaudeBot. Google has long published its crawler ranges. A request that claims to be GPTBot from an IP outside OpenAI's published range is, by definition, not GPTBot — it might be a security scanner with a copy-pasted User-Agent, a competitive analytics tool, or a bad actor probing your robots.txt for blocked paths.

That's the column the Visitor Analytics dashboard surfaces as “Verified” — the share of each bot's visits whose source IP matched the vendor's published range. In the sample brand the numbers look like:

Agents detected

IP-range verified

Per-bot breakdown with IP-range verification · last 30 days · 35+ AI bots tracked

Agent

How we built Visitor Analytics

The architecture is deliberately boring on the customer side. You drop one middleware file into your stack, set an environment variable, and you're done. The middleware fires a fire-and-forget GET to Ranqo's intake endpoint with the request URL, User-Agent, Referer, and client IP — and then immediately calls next(). Zero added latency on your response path. The Ranqo side does the rate-limiting, classification, IP-range verification, and roll-up.

The Next.js install looks like this (we ship similar snippets for Express, Hono, Fastify, Koa, Cloudflare Workers, Django, and raw cURL):

middleware.ts

import { NextRequest, NextResponse } from 'next/server';

export function middleware(req: NextRequest) {
  // Skip non-document fetches: prefetch, RSC, HEAD, OPTIONS
  if (req.method !== 'GET') return NextResponse.next();
  if (req.headers.get('next-router-prefetch')) return NextResponse.next();
  if (req.headers.get('rsc') === '1') return NextResponse.next();
  const purpose = req.headers.get('sec-purpose');
  if (purpose && purpose.startsWith('prefetch')) return NextResponse.next();

  const params = new URLSearchParams({
    url:        req.url,
    userAgent:  req.headers.get('user-agent')      ?? '',
    ref:        req.headers.get('referer')         ?? '',
    ip:         req.headers.get('x-forwarded-for') ?? '',
    websiteKey: process.env.RANQO_SITE_KEY!,
  });

  // Fire-and-forget — never await, never block the response
  fetch(`https://app.ranqo.ai/api/v1/intake/pageview?${params}`)
    .catch(() => {});

  return NextResponse.next();
}

Three things in the snippet are worth calling out because they are the failure modes other server-side analytics integrations ship broken:

Prefetch / RSC filtering. In a Next.js App Router site, every <Link> hover triggers a prefetch request that has the same URL as a real page view but isn't one. If you forget the next-router-prefetch and rsc=1 filters, your visit counts inflate 5-10× and become useless. We learned this in our own dashboard before we shipped Site Tracking externally — the first week's data was almost entirely fake.
Fire-and-forget. The fetch(...).catch(() => {}) pattern means no matter how slow Ranqo's intake endpoint is (target p95 < 50ms), or even if it's down entirely, the customer's request path isn't affected. We always return 200 from the intake endpoint regardless of whether the payload was valid — invalid keys and rate-limited requests get {"ok": false} but never an error status. There is no retry pressure on the customer side because there is no failure surface.
Real client IP. Server-side middleware has access to x-forwarded-for (or cf-connecting-ip on Cloudflare), which is the real client IP that pierces CDNs and proxies. That IP is the one we cross-reference against published vendor ranges for verification. A client-side JS tag could never have access to this without a server round-trip — by which point the bot has already left and the data isn't recoverable.

Behind the intake endpoint, the data path is HTTP GET → Postgres row within ~30 seconds (Inngest worker pulls off the queue, classifies the User-Agent against the 35+ AI bot catalog, looks up the IP range, writes the row). The dashboard polls the Postgres rollup and surfaces the KPIs, agents table, funnel, and live feed you've seen throughout this post.

What this looks like in the dashboard

The dashboard is one page under /b/[brand]/ai-traffic. Top of the page is the KPI strip you've now seen pieces of — Human Visits · Agent Visits · Verified Bots · Pages Crawled over the selected window (default last 30 days):

Site Tracking

Server-side capture of humans and AI bots — every visit, regardless of JS

Last 30 days

Human Visits

12,847

across 5 channels

Agent Visits

14,283

35+ AI bots tracked

Verified Bots

67%

by published IP range

Pages Crawled

unique paths hit

The KPIs roll up across all six dashboard cards. Underneath: the channel-level breakdown of where humans came from (Direct / Search / Social / AI Platforms / Email / Other), the AI-platform-specific bar list (Claude leads → ChatGPT → Perplexity → Gemini → Grok → Copilot → DeepSeek → Meta AI), the Agents Detected table you saw earlier, the Funnel by AI Platform table, the Top Crawled Paths bar (which URLs the bots actually hit), and the Live Feed.

The Live Feed is the surface most customers leave open as a monitor. Every visit in the last few minutes — bot or human, verified or not, with country and intent classification — streams in at the top of the list. It is the “thing is working“ signal at install time and the ”interesting things are happening on my site“ signal in steady state:

Live Feed

Auto-refreshes every 30s

Most recent visits across your site — bots, humans, country, intent

Chrome/blog/ai-visibility-new-seo-2026USHuman · AI Referralnow

ClaudeBot/blog/how-to-get-cited-by-chatgptAI bot8s

ChatGPT-User/pricingUSAgentic19s

Chrome/signupDEHuman · AI Referral32s

PerplexityBot/sitemap.xmlAI bot47s

Server-side · No client JS · No cookies35+ AI bots tracked

What we deliberately don't ship in Visitor Analytics: a closed-loop attribution model that ties an AI referral visit to a downstream conversion (a paid sign-up, a contract close). That kind of attribution requires identity stitching across your authenticated session, your CRM, and your billing system, and we don't have access to those. As we discussed in our broader piece on what AI crawlers see, the right place to wire closed-loop attribution is your own analytics stack — Mixpanel, Segment, or a data warehouse — which can join Ranqo's AI-referral channel with your conversion events. The honest framing: Ranqo measures the top of the funnel (the AI side of the visit); you join it to your bottom-of-funnel (the conversion side) in whichever system already owns your conversion data.

The honest summary

We didn't build Visitor Analytics because GA4 is broken. GA4 is the right tool for measuring an authenticated, consented, JS-enabled human session. We built it because that subset is getting smaller — bots are now 51% of traffic, AI bots specifically are 4.2% and growing fast, 50% to over 60% of humans reject cookies, and AI referrals are the highest-converting channel to land on a site in 2026. None of those four facts are fully visible inside a JS-tag analytics stack. Server-side intake isn't a replacement; it's the second source of truth that picks up everything the first one structurally can't see.

The new claim we're making with this product is narrower than “see all your traffic.” It's attribute every bot request to one of four intents — Training, Indexing, Agentic, Visit — per AI platform, verified against the vendor's published IP range. That funnel is the decision unit. A flat “AI bots = X% of traffic” number tells you nothing about which AI is just hoarding pages vs which is actually sending humans back vs which is in the middle of composing an answer about you right now. The four columns do.

4 stages

Training → Indexing → Agentic → Human Visit, attributed per AI platform. Most analytics show only the last one (and even that unreliably). The four-stage funnel makes every bot request visible against the citation cycle it actually belongs to.

Closed-loop conversion attribution still lives in your own analytics stack. Cryptographic bot verification still depends on vendor adoption of HTTP Message Signatures, which isn't there yet. Stealth crawlers will keep evading detection. The Visitor Analytics product is honest about all three. What it does well is the per-bot-per-stage attribution that's structurally unavailable anywhere else. If you want to see your AI traffic the way it actually moves — not the way a rolled-up number describes it — install the middleware and come look at your funnel.

See your AI traffic, before your next deploy

Drop the middleware into your Next.js, Express, or Cloudflare Worker setup. First bot visit logged within minutes — classified by intent, verified by IP range, and reconciled into the four-stage funnel above. No JS bundle, no cookie banner, no ad-blocker risk. If you want to see what AI platforms actually do with your pages once they crawl them, that's covered in the companion guide.

Start free trial

Written by

Nisha Kumari

Co-Founder at Ranqo

Nisha Kumari is Co-Founder at Ranqo, where she leads growth strategy and client acquisition. With a background in digital marketing and financial management, she specializes in SEO, Generative Engine Optimization, and helping brands build visibility across AI platforms.

Share this article

Guide

The AI Traffic Funnel: Training, Indexing, Agentic, Visit — and What GA4 Misses

Nisha Kumari|May 15, 202621 min read

The single number “AI traffic” is the wrong unit. AI bots make three different kinds of requests — training, indexing, agentic — that have wildly different business value. Server-side capture per AI platform is the only architecture that separates them.

Crawl-to-referral ratios — pages crawled per 1 human sent

AI traffic isn't one thing — four stages, four intents

Inside a single vendor, the same brand name produces requests across multiple distinct bots, each tied to a different purpose:

Training Crawl. The bot is gathering pages to add to the next version of the model's pretraining set. The visit produces zero short-term traffic and zero short-term citation impact. Its value is future: showing up at all in a future training run. Examples: GPTBot, ClaudeBot, Google-Extended, Applebot-Extended.
Indexing Fetch. The bot is building or refreshing a live search index that the AI consults at answer time. This is where the live-search grounding mode of ChatGPT, Gemini AI Mode, and Perplexity actually retrieves your page. Examples: OAI-SearchBot, GeminiBot, PerplexityBot.
Agentic Fetch. The bot is fetching your page during a live user conversation, on behalf of one specific user who just asked the AI something. This is the highest- intent signal in the AI stack — an answer is being composed right now that may cite you. Examples: ChatGPT-User, Perplexity-User, Claude-User.
Human Visit. A human clicks through from the AI's answer to your actual site. This is the only stage your standard analytics can even partially see — and it's the most-discussed but smallest number in the funnel.

Training is a future-value crawl. Indexing is a present-search crawl. Agentic is a now-this-second crawl. Human Visit is the conversion. Most analytics conflate all four; the customer decisions for each are completely different.

Funnel by AI platform

Crawl → Visit

Training Crawl → Indexing → Agentic Fetch → Human Visit

5,958 training7,396 indexing929 agentic2,912 visits

Platform

Training

Indexing

Agentic

Visits

Gemini

Citing + driving traffic

1,240

3,247

124

387

ChatGPT

Citing + driving traffic

2,847

2,103

412

1,247

Perplexity

Citing + driving traffic

142

1,824

187

542

Claude

Citing + driving traffic

1,512

158

198

712

Meta AI

Driving early traffic

217

Connect a single crawl to the human visit it producedUpdated every 30s

Stage 1: Training crawls

AI crawler year-over-year request growth

Stage 2: Indexing fetches

Stage 3: Agentic fetches — the highest-intent signal

A training crawl is “you might be cited next year.” An indexing fetch is “you might be cited next week.” An agentic fetch is “the answer that may cite you is being typed right now.“

Stage 4: Human visits — closing the loop

Sign-up conversion rate by traffic source

Verifying bot identity: User-Agent alone isn't enough

Agents detected

IP-range verified

Per-bot breakdown with IP-range verification · last 30 days · 35+ AI bots tracked

Agent

How we built Visitor Analytics

The Next.js install looks like this (we ship similar snippets for Express, Hono, Fastify, Koa, Cloudflare Workers, Django, and raw cURL):

middleware.ts

import { NextRequest, NextResponse } from 'next/server';

export function middleware(req: NextRequest) {
  // Skip non-document fetches: prefetch, RSC, HEAD, OPTIONS
  if (req.method !== 'GET') return NextResponse.next();
  if (req.headers.get('next-router-prefetch')) return NextResponse.next();
  if (req.headers.get('rsc') === '1') return NextResponse.next();
  const purpose = req.headers.get('sec-purpose');
  if (purpose && purpose.startsWith('prefetch')) return NextResponse.next();

  const params = new URLSearchParams({
    url:        req.url,
    userAgent:  req.headers.get('user-agent')      ?? '',
    ref:        req.headers.get('referer')         ?? '',
    ip:         req.headers.get('x-forwarded-for') ?? '',
    websiteKey: process.env.RANQO_SITE_KEY!,
  });

  // Fire-and-forget — never await, never block the response
  fetch(`https://app.ranqo.ai/api/v1/intake/pageview?${params}`)
    .catch(() => {});

  return NextResponse.next();
}

Three things in the snippet are worth calling out because they are the failure modes other server-side analytics integrations ship broken:

Prefetch / RSC filtering. In a Next.js App Router site, every <Link> hover triggers a prefetch request that has the same URL as a real page view but isn't one. If you forget the next-router-prefetch and rsc=1 filters, your visit counts inflate 5-10× and become useless. We learned this in our own dashboard before we shipped Site Tracking externally — the first week's data was almost entirely fake.
Fire-and-forget. The fetch(...).catch(() => {}) pattern means no matter how slow Ranqo's intake endpoint is (target p95 < 50ms), or even if it's down entirely, the customer's request path isn't affected. We always return 200 from the intake endpoint regardless of whether the payload was valid — invalid keys and rate-limited requests get {"ok": false} but never an error status. There is no retry pressure on the customer side because there is no failure surface.
Real client IP. Server-side middleware has access to x-forwarded-for (or cf-connecting-ip on Cloudflare), which is the real client IP that pierces CDNs and proxies. That IP is the one we cross-reference against published vendor ranges for verification. A client-side JS tag could never have access to this without a server round-trip — by which point the bot has already left and the data isn't recoverable.

What this looks like in the dashboard

Site Tracking

Server-side capture of humans and AI bots — every visit, regardless of JS

Last 30 days

Human Visits

12,847

across 5 channels

Agent Visits

14,283

35+ AI bots tracked

Verified Bots

67%

by published IP range

Pages Crawled

unique paths hit

Live Feed

Auto-refreshes every 30s

Most recent visits across your site — bots, humans, country, intent

Chrome/blog/ai-visibility-new-seo-2026USHuman · AI Referralnow

ClaudeBot/blog/how-to-get-cited-by-chatgptAI bot8s

ChatGPT-User/pricingUSAgentic19s

Chrome/signupDEHuman · AI Referral32s

PerplexityBot/sitemap.xmlAI bot47s

Server-side · No client JS · No cookies35+ AI bots tracked

The honest summary

4 stages

See your AI traffic, before your next deploy

Start free trial

Written by

Nisha Kumari

Co-Founder at Ranqo

Share this article

Crawl-to-referral ratios — pages crawled per 1 human sent

Why client-side analytics misses both sides

What each tool sees, by visitor type

AI traffic isn't one thing — four stages, four intents

Funnel by AI platform

Stage 1: Training crawls

AI crawler year-over-year request growth

Stage 2: Indexing fetches

Stage 3: Agentic fetches — the highest-intent signal

Stage 4: Human visits — closing the loop

Sign-up conversion rate by traffic source

Verifying bot identity: User-Agent alone isn't enough

Agents detected

How we built Visitor Analytics

What this looks like in the dashboard

Site Tracking

Live Feed

The honest summary

See your AI traffic, before your next deploy

Nisha Kumari

Crawl-to-referral ratios — pages crawled per 1 human sent

Why client-side analytics misses both sides

What each tool sees, by visitor type

AI traffic isn't one thing — four stages, four intents

Funnel by AI platform

Stage 1: Training crawls

AI crawler year-over-year request growth

Stage 2: Indexing fetches

Stage 3: Agentic fetches — the highest-intent signal

Stage 4: Human visits — closing the loop

Sign-up conversion rate by traffic source

Verifying bot identity: User-Agent alone isn't enough

Agents detected

How we built Visitor Analytics

What this looks like in the dashboard

Site Tracking

Live Feed

The honest summary

See your AI traffic, before your next deploy

Nisha Kumari