AI Mode Query Fan-Out Analyzer main interface, showing the Generated Queries list and a Content Similarity Analysis table with overall similarity, max passage similarity, and average passage similarity scores. — The main surface — a generated query fan-out and a ranked similarity table per URL.

Why I Built It

Clients kept asking the same question: "Is this page going to show up in AI Overviews?" The honest answer used to be, "Publish it and find out in a month." That's expensive. Content takes weeks to produce, and waiting for AI to re-crawl and decide costs real money.

AI Search doesn't rank URLs the way classic SEO did — it pulls passages. A page either has passages that match the retrieval query's intent or it doesn't. If I could simulate the retrieval before publishing, I could tell you which passages to keep, rewrite, or add.

That's what this does.

The Nine Query Variant Types

When a user types a query into AI Mode, the model doesn't just answer that one string — it internally reformulates. A seed query fans out into a family of related queries, and the system pulls passages that satisfy any of them. To simulate that, I prompt Gemini to generate up to 20 variants across nine types. The model picks the types that make sense for the seed — not every type fires for every query.

The first variant is always the original query, exactly as typed.

Equivalent. Rephrasings of the same question.
"did roger moore drive an aston martin" → "what car did roger moore drive"
Follow-up. Logical next questions that build on the original.
"did da vinci paint mona lisa" → "who commissioned da vinci to paint mona lisa"
Conversational follow-up. How people actually talk to AI Mode after getting a first answer. The topic stays in the query for semantic match.
"solar panels" → "are solar panels worth it?" / "how long do solar panels last?"
Generalization. Broader version of the question.
"best Italian restaurants in Manhattan" → "best restaurants in New York City"
Specification. More detailed or specific version.
"climate change" → "climate change effects on coastal cities"
Canonicalization. Slang or informal phrasing turned into standard terms.
"how to get rid of belly fat fast" → "abdominal fat reduction methods"
Entailment. Consequences, prerequisites, or implied facts.
"solar panel installation" → "solar panel maintenance requirements"
Clarification. Disambiguation when the seed query has multiple meanings.
"apple" → "apple fruit nutrition" or "apple iphone features"
Related entity. Closely related people, concepts, or products.
"iPhone 15 features" → "smartphone comparison 2024"

I force at least two or three conversational follow-ups in every run, because that's where AI Mode actually lives. Static keyword SEO gets you the equivalent and specification buckets. AI Search is where the other seven types matter.

How It Works Under the Hood

Six stages. The first three are setup. The last three are the analysis.

1. Scrape the Page

Three scraping modes, in order of strength:

Zyte API. The default. Handles JavaScript rendering, bot protection, and the sites that block everything else. Requires a Zyte key.
Selenium with stealth. Headless Chrome with selenium-stealth applied. Works on most sites, slower to run locally.
Plain requests. Fastest. Fails on anything that needs JS or blocks non-browser traffic.

2. Chunk Into Passages

Two granularities:

Passage-based (default). Walks the DOM and treats semantic blocks — paragraphs, list items, heading sections — as individual passages. Mirrors how retrieval actually works.
Sentence-based. Every sentence is a unit. Higher noise, finer-grained coverage.

Passage-based mode also supports a sliding sentence-overlap window, so a passage bleeds a sentence or two into its neighbors — useful when a single retrieval-worthy idea crosses a paragraph break.

3. Generate the Query Fan-Out

Gemini receives the prompt described above with the seed query. It returns a Python-parseable list of query strings. Nothing fancy — no chain of thought, no voting. It's a one-shot call.

4. Embed Everything

Eight embedding models available. Pick one for the whole run:

Local (free, CPU). all-mpnet-base-v2 (quality, default), all-MiniLM-L6-v2 (speed), all-distilroberta-v1 (balanced), mixedbread-ai/mxbai-embed-large-v1 (large, slow on CPU).
OpenAI. text-embedding-3-small, text-embedding-3-large.
Gemini. embedding-001, embedding-2-preview (multimodal).

Embeddings are cached by SHA-256 of (model name + text), so re-runs on the same content don't pay the API cost twice.

5. Compute Pairwise Cosine Similarity

Every passage is compared to every query. For a page with 40 passages and 7 queries, that's 280 comparisons — each a dot product of normalized vectors. Fast even on CPU.

6. Highlight and Score

Three bands, keyed to real empirical thresholds I've used in client work:

≥ 0.75 — strong match. Covered.
0.60 – 0.75 — borderline. Worth rewriting for a tighter match.
< 0.60 — gap. Either the content isn't there, or the phrasing is too far from the query language.

The UI renders the source page's HTML with passages color-coded inline, plus a ranked table of passages × queries.

Detailed passage analysis for the query 'benefits of server-side rendering' — the source page rendered with green passages (covered) and red passages (gaps) highlighted inline, alongside a Top 5 Passages list ranked by similarity score. — Passage analysis for a single query. Green = covered (≥0.75). Red = gap (<0.60). The right column ranks the top passages by score.

Inputs and Outputs

Inputs. A seed query. Number of variants (3–20, seven is the sweet spot). Input mode (URL list, pasted text, or persona-prompt ranking). Scraping method. Analysis granularity. Embedding model.

Outputs. Inline highlighted HTML of the source page (green / amber / red passages you can read in place). Ranked passage × query table. A gap report for passages below 0.60 against every query. Optional Gemini-generated SEO recommendations. And a prompt-ranking mode if you're choosing between candidate prompts for an AI app.

Passage similarity heatmap showing 20+ queries (rows) scored against 51 passages (columns) in a green-to-yellow gradient. Brighter cells indicate stronger matches between specific query and passage pairs. — Passage × query heatmap. One row per query, one column per passage. Bright cells are retrievable matches; darker cells are gaps.

AI-Powered SEO Recommendations dashboard showing average similarity of 82.73 percent, 0 content gaps, 20 strong matches, and structured recommendations across Content Gaps and Semantic Expansion, Content Structure for AI Extraction, Semantic Coverage, and AI Search Optimization. — The optional SEO recommendations view — a Gemini-generated rewrite list organized by section, grounded in the actual similarity scores above.

Stack

Python, Streamlit, no database. The whole thing runs in a single process: sentence-transformers for local embeddings, google-genai for query generation and Gemini embeddings, openai for the OpenAI embedding family, huggingface_hub for gated-model auth, scikit-learn for cosine similarity, plotly for the visualizations, beautifulsoup4 + trafilatura + selenium-stealth for scraping, nltk for sentence splitting.

Deployed on Posit Connect Cloud.

What It Doesn't Do

Worth being direct here:

It's not a prediction. Cosine similarity between your passages and a fan-out query family is a proxy for retrievability. It doesn't guarantee AI Mode will cite you — citation depends on authority signals, recency, and dozens of things this tool can't see.
The fan-out is Gemini's guess at what AI Mode generates, not the actual fan-out. Google doesn't publish that. Treating the output as directionally correct is fine. Treating it as ground truth is not.
Embedding choice matters a lot. The same passage scored with MPNet vs. OpenAI 3-large can land in different bands. Pick one model and stick with it for a project so the scores are comparable.
It doesn't fix your page for you. It shows you the gaps. You (or I) still have to write the passages that close them.

When to Use It

Before publishing. Run a draft through it, fix the red passages, publish.
Competitive audit. Run your page and your top-three competitors against the same seed query. The one with the most green passages is probably the one AI Mode is reaching for.
Content refresh. Old page under-performing? Check which passages have degraded coverage for the queries you care about now.
Prompt engineering. Use the persona-ranking mode to pick the best prompt for an AI-driven feature in your product.

Try It Now

Live on Posit Connect Cloud. Open full-screen →

Working With Me on This

The Fan-Out Analyzer is free to use. The harder part is interpreting the scores and writing the passages that close the gaps — that's what the AI SEO consulting service is for. If you want me to run it against your site and return a rewrite list, start a conversation.

AI Mode Query Fan-Out Analyzer.