The AI Citation Dictionary: 100 Terms Every Marketer Should Know
100 essential terms across 10 categories -- the canonical reference for AI visibility, GEO, citation behavior, and AI-era marketing measurement. Each definition is concise, structured for AI extraction, and grounded in verified research.
The vocabulary of AI visibility moves faster than any single glossary can keep up with. Terms that didn't exist two years ago -- GEO, query fan-out, llms.txt, brand persistence, parametric memory -- are now the difference between a marketer who understands what's happening and one who doesn't. This dictionary defines 100 essential terms across 10 categories. Every definition is concise enough to extract, structured for AI to cite, and grounded in verified research where applicable.
100
essential terms across 10 categories -- the canonical reference for AI visibility, GEO, citation behavior, and AI-era marketing measurement
Use this as a quick reference (jump to any category below) or read straight through for a comprehensive understanding of the field. Each definition links back to verified sources where underlying research informs the terminology. For context on how these concepts fit together, start with Ranqo's complete GEO guide.
10 Categories, 100 Terms
Jump to any category, or read straight through for the complete glossary.
Why a Glossary, and Why Now
AI platforms cite definitional content disproportionately often. When a user asks ChatGPT "What is GEO?" or Perplexity "What does query fan-out mean?", the AI looks for clear, structured term-definition pairs it can quote. Wikipedia is the most-cited single domain for ChatGPT for exactly this reason: it's the canonical reference for definitions across nearly every topic.
Most companies don't realize this opportunity. They write blogs explaining concepts in narrative form, embedding definitions in long paragraphs that AI struggles to extract. A clean term-definition format -- the format you're reading right now -- is among the highest-citation-rate content structures available, especially when paired with schema markup like FAQ schema (which produces a 3.2x AI Overview boost per Frase's research).
The terminology in this dictionary is also where most AI-era marketing conversations break down. Senior leaders hear "mention rate" and "share of voice in AI" without a shared definition. Engineers hear "llms.txt" and ask whether it's a real standard. Marketing teams confuse GEO, AEO, and AISO. A common vocabulary unblocks all of those conversations.
1. Foundational AI Concepts
The base layer of vocabulary -- terms that describe how LLMs actually work under the hood. You can't reason about AI visibility without understanding the difference between parametric memory and retrieval, or what RAG actually does.
Foundational AI Concepts
Core terminology for understanding how AI works under the hood
- Large Language Model (LLM)
- An AI model trained on massive text datasets to understand and generate human-like language. LLMs power ChatGPT, Claude, Gemini, Perplexity, and Grok.
- Generative AI
- AI systems that produce new content (text, images, code, audio) rather than only classifying or predicting from existing data.
- Transformer
- The neural network architecture that powers all modern LLMs. Introduced in the 2017 paper "Attention Is All You Need."
- Embedding
- A numerical representation of text, images, or other data that captures semantic meaning. AI uses embeddings to find conceptually similar content.
- Vector Database
- A specialized database that stores embeddings and supports similarity search. The infrastructure layer behind retrieval-augmented generation.
- Retrieval-Augmented Generation (RAG)
- An architecture where AI retrieves external sources before generating a response, then cites those sources. The basis for AI citation.
- Fine-Tuning
- Adapting a pre-trained LLM to a specific domain or task by continuing training on a focused dataset. Distinct from prompting.
- Training Data
- The corpus of text and other content an LLM learns from during training. Cutoff dates determine what the model "knows" parametrically.
- Parametric Memory
- Information an LLM learned during training and stores in its weights. Distinct from real-time retrieval -- it has a knowledge cutoff.
- Hallucination
- When an LLM produces a confident but incorrect or fabricated statement. A leading reason AI platforms increasingly rely on real-time retrieval and citations.
2. Search Paradigms
The vocabulary of how search itself is changing. SEO is no longer the only game -- and the GEO/AEO distinction matters when you're scoping work and explaining strategy. The Princeton/Georgia Tech KDD 2024 paper that formalized "GEO" is here.
Search Paradigms
The vocabulary of search optimization across the SEO-to-GEO transition
- SEO (Search Engine Optimization)
- The practice of optimizing content to rank in traditional search engines like Google and Bing.
- GEO (Generative Engine Optimization)
- The practice of optimizing content so AI platforms cite it in their responses. Term formalized in Princeton/Georgia Tech KDD 2024 research.
- AEO (Answer Engine Optimization)
- Broader umbrella covering AI answers, featured snippets, and voice assistants. Often used interchangeably with GEO.
- AISO (AI Search Optimization)
- Synonym for GEO/AEO, emphasizing optimization specifically for AI-powered search experiences.
- AI Search
- Search experiences where AI synthesizes a direct answer from multiple sources, typically with citations, instead of returning a list of links.
- Generative Search
- Search powered by generative AI that composes answers in natural language. Examples: Google AI Overviews, ChatGPT Search, Perplexity.
- Conversational Search
- Multi-turn search dialogue where the AI maintains context across follow-up questions. Replaces single-query search behavior.
- Zero-Click Search
- When a user gets their answer directly from the search results page (or AI overview) without clicking through to any website.
- AI Overview
- Google's AI-generated summary appearing above traditional search results. Now appearing on roughly a quarter of all Google searches.
- Featured Snippet
- A direct answer extracted from a webpage and displayed at the top of Google search results. The traditional precursor to AI Overviews.
3. AI Platforms
The AI assistants and search experiences your buyers use. Market share data here is current to First Page Sage's April 2026 report. For a deeper breakdown of how each platform selects sources, see Ranqo's platform-specific playbook.
AI Platforms
The major AI assistants and search experiences your buyers use
- ChatGPT
- OpenAI's AI chatbot. As of April 2026, holds 60.2% AI chatbot market share (First Page Sage). Search functionality aligns 87% with Bing.
- Claude
- Anthropic's AI assistant, known for nuanced, balanced analysis with high disclaimer rates. ~5% market share.
- Perplexity
- AI-powered answer engine with strong inline citations and real-time web crawl. Maintains an independent index from Google/Bing.
- Gemini
- Google's AI assistant. Inherits Google's infrastructure and Knowledge Graph. ~15% AI chatbot market share (April 2026).
- Grok
- xAI's chatbot integrated with X/Twitter for real-time social signal access. Differentiated by current discourse data.
- Microsoft Copilot
- Microsoft's AI assistant powered by OpenAI models. Built into Bing, Office 365, and Windows.
- ChatGPT Search (SearchGPT)
- OpenAI's search-optimized ChatGPT mode. Built on Bing's index, with 87% citation match to Bing top results.
- Google AI Mode
- Google's full conversational AI search experience, distinct from AI Overviews. Replaces traditional results with conversational responses.
- DeepSeek
- Open-source Chinese LLM that gained significant adoption in 2025. Increasingly used in cost-sensitive RAG applications.
- Generative Engine
- Umbrella term for any AI system that synthesizes answers from multiple sources -- ChatGPT, Perplexity, Gemini, Claude, Copilot all qualify.
4. AI Crawlers & Bots
The user agents that actually visit your site to feed AI systems. Most platforms run multiple bots with distinct purposes (training vs. real-time vs. search index). For everything they fetch (and don't execute), see Ranqo's walkthrough of what AI sees when it crawls.
AI Crawlers & Bots
User agents visiting your site to feed AI systems
- GPTBot
- OpenAI's crawler for training data collection. Does not execute JavaScript. Respects robots.txt.
- ChatGPT-User
- OpenAI bot triggered when a ChatGPT user actively requests web content. Makes 3.6x more requests than Googlebot per Search Engine Journal data.
- OAI-SearchBot
- OpenAI's crawler for ChatGPT Search index. Distinct from GPTBot (training) and ChatGPT-User (real-time fetches).
- ClaudeBot
- Anthropic's training data crawler. Part of a three-bot system alongside Claude-User and Claude-SearchBot.
- Claude-User
- Anthropic bot used for user-triggered web fetches inside Claude. Requires explicit user request to activate.
- Claude-SearchBot
- Anthropic's search infrastructure crawler that determines what Claude can cite in its answers.
- PerplexityBot
- Perplexity's primary indexing crawler. Cloudflare documented inconsistent robots.txt compliance in August 2025.
- Google-Extended
- Google's crawler for Gemini training data. Inherits Googlebot's JavaScript rendering capability -- the only AI crawler that does.
- CCBot
- Common Crawl's bot. Its archive is used by many LLMs (including older ChatGPT versions) as training data.
- Bingbot
- Microsoft's search crawler. Critical for ChatGPT Search visibility because ChatGPT relies on Bing's index, not Google's.
5. Citation Mechanics
How AI platforms select, attribute, and reference sources. This is the layer most marketers don't understand -- query fan-out alone explains why optimizing for one keyword rarely transfers to AI visibility. For the disconnect with Google rankings, see Ranqo's research on the Google-AI gap.
Citation Mechanics
How AI selects, attributes, and references sources
- Citation
- When an AI platform references a specific URL or source as the origin of information in its response.
- Source Attribution
- The practice of naming the source of information within an AI response. Distinct from a hyperlinked citation.
- Inline Citation
- Source references embedded directly in the AI's response text. Perplexity does this for 95% of claims; ChatGPT often does not.
- Citation Half-Life
- The time period over which a citation"s value decays. AI visibility decays 30-60 days before measurable performance drops.
- Brand Persistence
- Whether a brand cited in one AI response continues to appear in subsequent responses. Only ~30% of brands persist between consecutive responses.
- Mention Rate
- The percentage of relevant AI queries in your category that include your brand. The foundational AI visibility KPI.
- Position
- Where your brand is mentioned within an AI response (first, second, third). Position 1.2 average is excellent (DerivateX B2B SaaS data).
- Share of Voice (SoV)
- Your brand's share of total mentions in AI responses for category-level queries. The AI equivalent of search market share.
- Query Fan-Out
- AI technique of breaking one user question into multiple sub-queries to retrieve diverse sources. Often produces 8+ sub-queries per ChatGPT prompt.
- Grounding
- The process of backing AI responses with verifiable external sources rather than relying solely on parametric memory.
6. Content Optimization
Tactical content terminology, with citation impact data attached where research exists. For the full 7-step implementation framework, see Ranqo's optimization playbook.
Content Optimization
Tactical content terminology for boosting AI citation rates
- Answer-First Formatting
- Placing the direct answer in the first 40-60 words of each section. Onely measured a 140% ChatGPT citation increase from this technique alone.
- Listicle
- A numbered or bulleted list-format article. Listicles account for 21.9% of all AI citations -- the highest of any content format.
- Comparison Content
- "X vs Y" pages comparing products or options head-to-head. Achieves 45-60% citation rates -- the highest of any single format.
- Pillar Content
- A comprehensive, definitive resource that covers a topic thoroughly and links to subtopics. Designed to be the canonical reference.
- Hub Page
- A central page that organizes and links to related content. Helps AI understand topical relationships across your site.
- Content Depth
- The substantive thoroughness of content. Articles 1,500+ words receive 4.7x more AI citations (Hashmeta).
- Content Freshness
- How recently content has been updated. Pages updated within 30 days receive 3.2x more AI citations (rank.bot).
- Definition Statement
- An explicit "[X] is [definition]" sentence pattern. AI extracts these directly when answering "what is" queries.
- Third-Party Mention
- When external sites (review platforms, press, analyst reports) reference your brand. Brands are 6.5x more likely cited via third-party sources.
- Original Research
- Proprietary surveys, analyses, or experiments your brand publishes. Adding original statistics increases AI visibility by 41% (Princeton GEO).
7. Technical Standards
File formats, schemas, and protocols that AI crawlers respect (or are expected to). For the complete llms.txt analysis, including verified adoption data, see Ranqo's llms.txt complete guide.
Technical Standards
File formats, schemas, and protocols AI crawlers respect
- robots.txt
- A plain-text file telling crawlers which URLs to avoid. Critical for AI: blocking GPTBot or ClaudeBot here removes you from those platforms.
- sitemap.xml
- An XML file listing all the URLs on your site, helping search and AI crawlers discover content efficiently.
- Schema Markup
- Structured data added to your HTML to help AI and search engines understand content meaning. Implemented as JSON-LD.
- JSON-LD
- JavaScript Object Notation for Linked Data. The recommended format for adding schema markup to web pages.
- Structured Data
- Standardized format for providing information about a page and classifying its content. Only 12.4% of websites use it.
- FAQ Schema
- Structured data type for question-and-answer content. Pages with FAQ schema are 3.2x more likely to appear in AI Overviews (Frase).
- HowTo Schema
- Schema type for step-by-step instructional content. Helps AI extract sequential steps cleanly.
- Article Schema
- Schema type for article content, with fields for author, datePublished, dateModified, and headline.
- llms.txt
- A proposed standard file (proposed by Jeremy Howard, Sept 2024) at /llms.txt that gives LLMs a curated map of your site. ~10.13% adoption.
- llms-full.txt
- Companion to llms.txt that includes concatenated full page content, allowing LLMs to load core content without separate requests.
8. Crawling & Rendering
The technical concepts that determine what AI can actually read on your site. The single most important fact in this category: AI crawlers do not execute JavaScript -- a finding Vercel verified across millions of crawler requests.
Crawling & Rendering
Technical concepts that determine what AI can actually read on your site
- Server-Side Rendering (SSR)
- Generating HTML on the server before sending it to the browser. Critical for AI visibility -- AI crawlers don't execute JavaScript.
- Client-Side Rendering (CSR)
- Generating page content via JavaScript after the browser receives an empty HTML shell. Invisible to AI crawlers without SSR.
- Static Site Generation (SSG)
- Pre-building HTML pages at build time. Functionally equivalent to SSR for AI visibility -- content is in raw HTML.
- Hydration
- The process where JavaScript activates a server-rendered page in the browser. Adds interactivity without affecting AI visibility.
- JavaScript Execution
- The process of running JS code in a browser. AI crawlers do NOT do this -- 500M+ GPTBot fetches showed zero JS execution (Passionfruit).
- Crawl Budget
- The number of pages a crawler will visit on your site within a given time period. AI crawler budgets are growing rapidly.
- Indexation
- Whether a page has been added to a search engine or AI platform's index of known content.
- Core Web Vitals (CWV)
- Google's page performance metrics (LCP, FID, CLS). Acts as a gate for AI: severe failures hurt visibility, but good-to-great has minimal impact.
- First Contentful Paint (FCP)
- How quickly the first content appears on screen. Pages with FCP under 0.4s receive 3x more AI citations (ZipTie).
- Mobile-First Indexing
- Search and AI systems primarily evaluate the mobile version of your site, not desktop. Content parity across viewports matters.
10. Measurement & Analytics
The KPIs and metrics for tracking AI visibility over time. Traditional SEO metrics don't cover this surface -- mention rate, share of voice in AI, sentiment, and citation source tracking are the new dashboard. For the audit framework that maps to all of these, see Ranqo's AI readiness audit guide.
Measurement & Analytics
KPIs and metrics for tracking AI visibility over time
- AI Visibility
- The umbrella metric for how present your brand is across AI platforms. Composed of mention rate, position, sentiment, and SoV.
- AI Referral Traffic
- Traffic to your site originating from AI platforms (ChatGPT, Perplexity, etc.). Grew 156% YoY through 2025.
- Click-Through Rate (CTR)
- The percentage of impressions that result in a click. Organic CTR for AI Overview queries dropped 61% in 15 months (Seer Interactive).
- Conversion Rate
- The percentage of visitors who complete a desired action. AI-referred traffic converts 4.4x better than organic search (Semrush).
- Sentiment Analysis
- Analyzing the tone (positive, neutral, negative) of how AI platforms describe your brand. Negative sentiment is worse than no mention.
- Citation Source Tracking
- Identifying which third-party sites AI platforms cite when discussing your category. Reveals optimization targets.
- AI Mention Tracking
- The practice of monitoring brand mentions across AI platforms over time. Required because AI responses are volatile.
- Brand Sentiment
- The aggregate tone of brand mentions across AI responses. Tracked at the platform level since each platform has different patterns.
- Attribution
- Determining which marketing channel drove a conversion. Increasingly difficult as AI traffic appears as "direct" or generic referrers.
- Citation Decay
- The reduction in AI citations over time without ongoing optimization. Visibility decays 30-60 days before performance metrics show it.
How to Use This Dictionary
For marketers: bookmark this page. When a vendor, agency, or consultant uses unfamiliar AI vocabulary, look it up here first. The categories are organized roughly by abstraction level -- foundational concepts at the top, measurement at the bottom.
For content teams: this dictionary itself is an example of citation-optimized content. Each term and definition is in a <dl> structure with <dt> and <dd> elements -- the HTML pattern AI extracts most reliably. Apply the same structure to your own glossary, FAQ, and definitional pages.
For executives: if your team can't define mention rate, share of voice in AI, and brand persistence, you have a measurement gap. Those three metrics are the foundation of any credible AI visibility KPI dashboard.
For agencies: share this with clients during onboarding. Pre-empting the terminology questions saves weeks of explanation and accelerates strategy alignment.
A common vocabulary is the cheapest, fastest way to align an organization on AI visibility strategy. Definitions before tactics; tactics before tools.
Track every metric in this dictionary
Mention rate, share of voice, position, sentiment, citation sources -- across ChatGPT, Claude, Perplexity, Gemini, and Grok. For deeper context, also see the 15 anti-GEO mistakes to avoid and the AI readiness audit framework.
Start trackingWritten by
Nisha Kumari
Nisha Kumari is Co-Founder at Ranqo, where she leads growth strategy and client acquisition. With a background in digital marketing and financial management, she specializes in SEO, Generative Engine Optimization, and helping brands build visibility across AI platforms.
Share this article