One of the common objections to AEO investment is that it can't be measured. This is partly true and mostly wrong. You can't measure AI citations the way you measure organic search traffic — there's no equivalent of Google Search Console that shows you every time an AI cited your brand. But you can build a structured testing methodology that gives you a reliable picture of your brand's AI citation frequency and how it changes over time.

The measurement challenge is real: AI outputs are non-deterministic, platform-specific, and context-dependent. The same query submitted twice may yield different citations. What shows up on Perplexity may not show up on ChatGPT. None of this is tracked in any analytics system by default. But the solution is structured sampling, not resignation.

Building a query set

The foundation of any AEO measurement system is a structured set of queries — the specific questions buyers in your category are likely to ask AI systems. These are not keyword lists; they're full natural-language questions that reflect how buyers actually research and evaluate options.

A representative query set for a B2B SaaS brand might include:

  • Category definition queries: "What is [category]?" "How does [category] work?"
  • Comparison queries: "Best [category] platforms for [use case]" "How does X compare to Y?"
  • Evaluation queries: "What should I look for in [category] software?" "How do I choose a [category] vendor?"
  • Problem-framing queries: "How do companies solve [problem]?" "What tools do teams use for [use case]?"
  • Brand-direct queries: "[Your brand name] — what is it?" "[Your brand] reviews" "[Your brand] alternatives"

The goal is 20–50 queries that cover the realistic range of how buyers discover and evaluate your category. More than 50 is usually unnecessary for most B2B categories; fewer than 20 gives you too thin a sample to detect meaningful patterns.

Running the tests

For each query, you submit it to each platform you're tracking (typically Perplexity, ChatGPT with and without browsing, and Google AI Overviews) and record: whether your brand was mentioned, where in the response it appeared, and which sources were cited. This should be done in a fresh session for each query — not in a continuing conversation context where prior messages affect the output.

Record the results in a simple spreadsheet. Column headers: date, query, platform, brand mentioned (yes/no), position in response (primary mention / secondary / citation only / absent), competitor mentions. Over time this becomes a quantitative record of your brand's AI citation presence.

Run the full query set at regular intervals — monthly is sufficient for most brands, though quarterly is better than nothing. The value isn't in any single snapshot; it's in the trend. Are more queries mentioning your brand? Is your position in responses improving (moving from citation-only to named in the synthesized text)? Are competitor brands pulling ahead in specific query clusters?

Platform-specific patterns

Because the platforms behave differently, analyzing results by platform separately is important. Your brand might appear consistently in Perplexity citations while being absent from ChatGPT's parametric knowledge — which suggests a content indexing and retrieval problem rather than a training data problem. Or vice versa, which suggests your editorial presence predates recent model versions but your current content isn't well-structured for real-time retrieval.

These patterns point to different remediation strategies. Understanding the architectural differences between platforms is what allows you to interpret measurement results correctly rather than treating all AI citation as a single undifferentiated metric.

What good looks like

For most B2B SaaS brands entering AEO measurement for the first time, the baseline is sobering. Fewer than 20% of queries produce a brand mention. For category-definition queries, the brand rarely appears at all — those answers tend to cite industry publications and established reference sites, not vendor content.

Realistic improvement targets over a 12-month active AEO program: moving from sub-20% mention frequency to 40–60% on category and comparison queries; moving from citation-only mentions to named mentions in synthesized text on evaluation queries; beginning to appear on category-definition queries as a referenced example rather than a generic non-answer.

These aren't dramatic numbers, but they represent a meaningful shift in brand presence during a period when most competitors aren't measuring at all. The gap between brands that are and aren't doing this work is already forming, and measurement is the first prerequisite for closing it.

Connecting measurement to effort

Measurement without a change program produces anxiety, not improvement. The query set you build for measurement should also drive your content and editorial priorities — the queries where you're absent most consistently are the highest-value targets for new content, schema markup, or editorial coverage efforts.

Running AEO campaigns that build genuine citation presence without measuring the baseline and tracking change over time is the equivalent of running paid search without looking at conversion data. The work matters, but you can't optimize what you can't see.