Services · Tools

AI SEO Tools.

The tools I've built for AI Search content optimization. Open source, free to use, deployed on Posit Connect Cloud. Featured: the AI Mode Query Fan-Out Analyzer.

AI Mode Query Fan-Out Analyzer

A Streamlit app that scores your content against the way Google AI Mode actually reformulates queries. Paste a URL and a seed query. The tool generates up to 20 realistic query variants, embeds each passage on the page, and tells you which passages are covered and which are gaps.

Launch the App

AI Mode Query Fan-Out Analyzer main interface, showing the Generated Queries list and a Content Similarity Analysis table with overall similarity, max passage similarity, and average passage similarity scores.
The main surface — a generated query fan-out and a ranked similarity table per URL.

Why I Built It

Clients kept asking the same question: "Is this page going to show up in AI Overviews?" The honest answer used to be, "Publish it and find out in a month." That's expensive. Content takes weeks to produce, and waiting for AI to re-crawl and decide costs real money.

AI Search doesn't rank URLs the way classic SEO did — it pulls passages. A page either has passages that match the retrieval query's intent or it doesn't. If I could simulate the retrieval before publishing, I could tell you which passages to keep, rewrite, or add.

That's what this does.

The Nine Query Variant Types

When a user types a query into AI Mode, the model doesn't just answer that one string — it internally reformulates. A seed query fans out into a family of related queries, and the system pulls passages that satisfy any of them. To simulate that, I prompt Gemini to generate up to 20 variants across nine types. The model picks the types that make sense for the seed — not every type fires for every query.

The first variant is always the original query, exactly as typed.

  1. Equivalent. Rephrasings of the same question.
    "did roger moore drive an aston martin""what car did roger moore drive"
  2. Follow-up. Logical next questions that build on the original.
    "did da vinci paint mona lisa""who commissioned da vinci to paint mona lisa"
  3. Conversational follow-up. How people actually talk to AI Mode after getting a first answer. The topic stays in the query for semantic match.
    "solar panels""are solar panels worth it?" / "how long do solar panels last?"
  4. Generalization. Broader version of the question.
    "best Italian restaurants in Manhattan""best restaurants in New York City"
  5. Specification. More detailed or specific version.
    "climate change""climate change effects on coastal cities"
  6. Canonicalization. Slang or informal phrasing turned into standard terms.
    "how to get rid of belly fat fast""abdominal fat reduction methods"
  7. Entailment. Consequences, prerequisites, or implied facts.
    "solar panel installation""solar panel maintenance requirements"
  8. Clarification. Disambiguation when the seed query has multiple meanings.
    "apple""apple fruit nutrition" or "apple iphone features"
  9. Related entity. Closely related people, concepts, or products.
    "iPhone 15 features""smartphone comparison 2024"

I force at least two or three conversational follow-ups in every run, because that's where AI Mode actually lives. Static keyword SEO gets you the equivalent and specification buckets. AI Search is where the other seven types matter.

How It Works Under the Hood

Six stages. The first three are setup. The last three are the analysis.

1. Scrape the Page

Three scraping modes, in order of strength:

2. Chunk Into Passages

Two granularities:

Passage-based mode also supports a sliding sentence-overlap window, so a passage bleeds a sentence or two into its neighbors — useful when a single retrieval-worthy idea crosses a paragraph break.

3. Generate the Query Fan-Out

Gemini receives the prompt described above with the seed query. It returns a Python-parseable list of query strings. Nothing fancy — no chain of thought, no voting. It's a one-shot call.

4. Embed Everything

Eight embedding models available. Pick one for the whole run:

Embeddings are cached by SHA-256 of (model name + text), so re-runs on the same content don't pay the API cost twice.

5. Compute Pairwise Cosine Similarity

Every passage is compared to every query. For a page with 40 passages and 7 queries, that's 280 comparisons — each a dot product of normalized vectors. Fast even on CPU.

6. Highlight and Score

Three bands, keyed to real empirical thresholds I've used in client work:

The UI renders the source page's HTML with passages color-coded inline, plus a ranked table of passages × queries.

Detailed passage analysis for the query 'benefits of server-side rendering' — the source page rendered with green passages (covered) and red passages (gaps) highlighted inline, alongside a Top 5 Passages list ranked by similarity score.
Passage analysis for a single query. Green = covered (≥0.75). Red = gap (<0.60). The right column ranks the top passages by score.

Inputs and Outputs

Inputs. A seed query. Number of variants (3–20, seven is the sweet spot). Input mode (URL list, pasted text, or persona-prompt ranking). Scraping method. Analysis granularity. Embedding model.

Outputs. Inline highlighted HTML of the source page (green / amber / red passages you can read in place). Ranked passage × query table. A gap report for passages below 0.60 against every query. Optional Gemini-generated SEO recommendations. And a prompt-ranking mode if you're choosing between candidate prompts for an AI app.

Passage similarity heatmap showing 20+ queries (rows) scored against 51 passages (columns) in a green-to-yellow gradient. Brighter cells indicate stronger matches between specific query and passage pairs.
Passage × query heatmap. One row per query, one column per passage. Bright cells are retrievable matches; darker cells are gaps.
AI-Powered SEO Recommendations dashboard showing average similarity of 82.73 percent, 0 content gaps, 20 strong matches, and structured recommendations across Content Gaps and Semantic Expansion, Content Structure for AI Extraction, Semantic Coverage, and AI Search Optimization.
The optional SEO recommendations view — a Gemini-generated rewrite list organized by section, grounded in the actual similarity scores above.

Stack

Python, Streamlit, no database. The whole thing runs in a single process: sentence-transformers for local embeddings, google-genai for query generation and Gemini embeddings, openai for the OpenAI embedding family, huggingface_hub for gated-model auth, scikit-learn for cosine similarity, plotly for the visualizations, beautifulsoup4 + trafilatura + selenium-stealth for scraping, nltk for sentence splitting.

Deployed on Posit Connect Cloud.

What It Doesn't Do

Worth being direct here:

When to Use It

Try It Now

Live on Posit Connect Cloud. Open full-screen →

QueryDrift

The commercial counterpart to the Fan-Out Analyzer. Built with Grant Simmons (ex-Homes.com, The Search Agency). QueryDrift ingests your Google Search Console data, clusters every query in semantic space, and tracks how your topic focus drifts over time — the signal SEO teams lose when AI Overviews start eating clicks.

One score, one cluster map, the topics you own — and the ones slipping away.

Try QueryDrift — Free

QueryDrift dashboard — a QueryDrift Score of 60.8 with Quick Insights on site focus, largest topic, and best-ranking topic, plus an interactive query cluster map plotting hundreds of queries in semantic space.
QueryDrift dashboard — site focus, topic clustering, and semantic drift tracking.

Proprietary SEO Tools

These tools are proprietary and used exclusively on client engagements — not shipped as products. Summaries of what each one does:

Taxonomy Tool

A Next.js and TypeScript application that generates hierarchical e-commerce taxonomies from real site data. It ingests Screaming Frog crawls, Google Search Console queries, GA4 sessions, and Semrush keyword data, then uses Gemini to produce a category tree with meta titles, slugs, and JSON-LD BreadcrumbList markup — scoped to defined customer personas and ready for CMS integration.

AI Search Simulator

A Streamlit application that loads a site, or competitor sites, into a Qdrant vector database using EmbeddingGemma or Gemini embeddings. Once indexed, the collection can be queried the way an AI retrieval system would, content gap audits can be run against a sitemap, and internal-linking suggestions can be generated across the full embedding space. Scraping is handled via Zyte; entity detection via Google Cloud Natural Language.

Media Mix Modeling

A custom Bayesian Media Mix Modeling application, built in-house on similar principles to Google Meridian rather than on top of it. Users upload weekly or daily media spend and revenue, select their channels (paid search, paid social, display, video, affiliate), and the model returns channel attribution with credible intervals, diminishing-returns response curves, budget-allocation optimization, and what-if scenario planning. Designed for the conversation that starts with defending a marketing budget in front of a CFO.

Entity Gap Analysis

A Streamlit tool that extracts entities from both client content and competitor content using Google Cloud Natural Language, scores them against target queries, and renders a relationship graph via networkx. The output: a ranked list of entities competitors are using that the client is not — weighted by query relevance. The result feeds directly into content strategy decisions.


Working With Me on These

The Fan-Out Analyzer is free to use. QueryDrift has a free tier. The harder part is interpreting the scores and writing the passages that close the gaps — that's what the AI SEO consulting service is for. If you want me to run these against your site and return a rewrite list, start a conversation.

← Home