Search Query Data Mining

Search Query Data Mining

BigQuery is a Gold Mine of SEO Data

I previously wrote about the benefits of exporting Google Search Console (GSC) data into BigQuery. As Search Console only displays up to 1,000 rows of data within the UI, it’s obviously very limiting for SEO’s who want to do data mining of 1st party search data. With BigQuery, the only limitations are anonymized search queries but having access to hundreds of thousands or millions of rows of data is still way better than 1,000.

Use AI to Cluster Search Queries

One of the underrated features of LLM tools like ChatGPT or Gemini is their ability to cluster large amounts of data efficiently. Imagine having all of this search data but it would be nearly impossible for someone to manage their manually or through a spreadsheet alone. This is where AI clustering can really help in terms of grouping similar queries together, labeling with search intent, etc…

For this example I’m about to show, I want to accomplish two things:

  1. Identify potential search queries for AI search and AI Overviews (AIO)
  2. Cluster these how-to search queries together to help inform content expansion

How I did this was create a SQL query from my BigQuery database that extracted out search queries with four or more words from the past month. Once I had a list, I asked Gemini to cluster all of these search queries based on likelihood of generating AIO search results. One of these cluster consisted of “how-to” search queries.

Taking this another step further, rather than be limited to alphabetical search and cherry pick visually, I asked Gemini to cluster these how-to search queries into some charts. The chart that worked best for me was this dendrogram. The clusters each theme within it’s own color then connects to the other thematic clusters with a line based on semantic relevance.


Dendrogram / Use AI to Cluster Search Queries
Dendrogram / Use AI to Cluster Search Queries

Some SEO’s try to force FAQ’s onto any page possible but a smarter way to work sometimes is to try to understand the intent behind the search query then write your content in a way that addresses the intent. Placing an FAQ when it looks out of context is lazy. An example is, “what is Spanish moss?”. I don’t necessarily need to write that verbatim on the page if I simply answer that within the context of my product copy.

Gemini provide me with seven other types of search queries that are likely to result in AI Overviews, and those largely matched what I see from SEMRush but at a larger scale. There are so many opportunities like this that can be leveraged with the use of BigQuery. If you haven’t already started collecting your data I would highly recommend doing so as Search Console limits us to 16 months at a time. Every day that you’re not collecting this data is data lost forever unless Google decides to retroactively give it to us someday (not going happen).

Need an SEO Analyst? I’m available for consulting opportunies.

Please enable JavaScript in your browser to complete this form.
Name