Reddico 'R' Symbol
Guides

How to use BigQuery for enterprise SEO 

Michael Carden-Edwards

Posted by Michael Carden-Edwards

29 March 2023

On 21 February, Google announced a feature that excited many in the data science field of SEO, that being the ability to export data from Google Search Console to BigQuery. This has the potential to be game-changing for larger businesses using Google Search Console that have big URL inventories and are looking to delve beyond the data available immediately within the Search Console interface. 

For example, BigQuery exports will allow users to bypass the Search Console data export row limit of 1,000, opening the door to advanced data analysis that offers huge potential. From a keyword or query perspective alone, data going beyond the 1,000-row limit will enable brands to explore further content and SEO opportunities from deeper long-tail queries that may be garnering clicks and impressions to the site.

In short, BigQuery integration with Search Console will allow brands to make more sense of their Google Search Console data in the context of their other data and analytics sources. This has the capacity to be very powerful for brands looking to utilise their marketing data to its full potential. 

Search Console aside, we touched on the use of BigQuery in modern-day SEO reporting techniques in our article on transitioning to Google Analytics 4 insofar as how brands can link up BigQuery with Google’s new analytics platform. With the additional integration with Search Console, it’s clear that Google’s cross-platform use of BigQuery is something enterprise brands should seriously think about utilising in their data ventures.

Looking at Google Search Console, GA and elsewhere, let’s take a look at what BigQuery is and how large brands can get the most out of it with their data and overall SEO ventures.

What is BigQuery and why is it important for enterprise SEO?

BigQuery is Google’s enterprise data warehouse that allows you to store and visualise data with relative ease, either by uploading data in bulk or by enabling continuous data streams through connectors. 

As we’ve mentioned, it already links up with Google Analytics 4 and now Google Search Console, and can display its output across Looker Studio (formerly Data Studio) and Google Sheets among many others. Once data is streaming in the BigQuery warehouse within Google Cloud, selected datasets can be exported and visualised in the form of a Looker Studio or Tableau dashboard with just a few clicks. 

For enterprise brands managing large-scale SEO projects, it can help businesses access larger and more accurate datasets that can unlock incredibly valuable business insights with relative ease. When looking at Google Analytics for example, enterprise brands who receive data that exceed Google’s sampling thresholds (such as over 500,000 sessions across any data period) will be subject to sampled data that won’t be 100% accurate or show the bigger picture. 

GA’s paid model, Analytics 360, can overcome these limitations, though its pricing model can be off-putting even for the biggest brands. This is where enabling BigQuery can help. 

BigQuery warehouses can also be integrated into the same large-scale technical SEO tools and crawling software, allowing further mass data visualisation opportunities across areas such as log files among many other SEO metrics.

The hot topic at the moment is of course BigQuery’s integration with Google Search Console, so let’s take a look at what businesses can get out of that first.

Unlocking further Search Console insights with BigQuery

For enterprise brands, having the capability to analyse URLs and related search queries beyond the current sample of 1,000 native to Google Search Console carries huge potential for uncovering further SEO opportunities. 

To enable BigQuery in Google Search Console, head over to Google Cloud (you’ll need to have an account of course) and navigate to APIs & Services, then Enabled APIs & Services where you can then enable the BigQuery API. You’ll then need to grant permission to Search Console to allow it to export bulk data accordingly. Google has released a step-by-step process of how to do this in a useful webmaster answer.

Once you’ve completed this process, you’ll then need to head into Search Console where you can copy the project ID over from Google Cloud and then set an export destination before submitting. If the export is successful, you’ll then get an email notification confirming this, and you’ll now be able to export bulk datasets from Google Search Console into BigQuery.

Search Console bulk data export settings page

You can then start analysing the data as it's housed in BigQuery and integrate it with a Looker Studio dashboard or a reporting interface of your choosing. 

Using SQL statements within BigQuery

Prior to data visualisation, familiarity with SQL statements is required here in order to query the data and fetch what you’re looking for and then subsequently export it to your desired report. Those with limited SQL knowledge can look to explore countless online resources, with Google again providing useful documentation on some sample SQL statements to run within BigQuery to drill down accordingly. 

Check out the below query which will return data on URLs that had impressions for keywords containing brand names over the last two weeks (you’ll have to modify the elements in green of course):

SELECT
  regexp_contains(query, 'your_brand') AS contains_brand,
  sum(impressions) AS impressions,
  sum(clicks) AS clicks
FROM searchconsole.searchdataurlimpression
WHERE search_type = 'WEB'
  AND WHERE datadate between DATESUB(CURRENTDATE(), INTERVAL 14 day) and CURRENTDATE()
GROUP BY 1
ORDER BY 2 desc
LIMIT 1000

Within BigQuery, once you select a dataset and table, enter your desired SQL query and hit “Run”:

Bulk data export table shown in the BigQuery interfaceImage: Google

Once you’ve got your data, click on “Explore Data” and then “Explore with Looker Studio” for a quick visualisation of your bulk dataset. 

The export of Google Search Console to BigQuery enables brands much more potential insights into how their websites are performing on Google with regards to deep-set search terms that are generating clicks and impressions. For large sites, going beyond the 1,000 row threshold is potentially game changing when it comes to enhancing your content strategy and getting ahead of your competitors. 

Using BigQuery for Google Analytics 4

While Google Analytics 4 has divided opinion within the industry in terms of the data presented and its UI among other things, its ability to link BigQuery for free within a limit of 10GB of storage (something only previously available with Analytics 360 users) is pretty huge. 

Indeed, if you’re a big site such as a large ecommerce brand that receives organic traffic way over GA’s sampling thresholds, then getting an accurate picture via BigQuery is a must. 

Like with Search Console, BigQuery will house all your data and will allow you to run SQL queries to identify key data stories and export them to your desired reporting dashboards accordingly. 

Upon setting up BigQuery in the Google Cloud console, head over to GA4’s admin area and locate “BigQuery Links” underneath the Product Links tab:

Link your BigQuery project ID and select your location before configuring your remaining settings (such as whether you want to export once a day or enable continuous streaming) and away you go.

Once the data has successfully processed, you’ll then get a view in your Google Cloud console not too dissimilar to what our Google Search Console data output looked like. 

GA4 data leans heavily on event and user data, and as such some of the data formats and schema will differ in terms of what any of us may be used to within Universal Analytics. Google have us covered again in this area, so take a look at this help article on BigQuery export schema from GA4 and start to familiarise yourself with some of the ways in which the data is presented and what this means. 

Like with the Search Console, you’ll need to write the relevant SQL commands in order to pinpoint and extract your desired data points. SQL is a relatively easy language to learn, though if you or your team aren’t familiar with it there are plenty of cheat sheets out there to extract and play around with. Google have detailed some useful queries in their Google Analytics help centre, such as the command below which allows you to ascertain the top ten items added to a cart by the most number of users:

SELECT
  item_id,
  item_name,
  COUNT(DISTINCT userpseudoid) AS user_count
FROM
  -- Replace table name.
  bigquery-public-data.ga4obfuscatedwebecommerce.events*, UNNEST(items)
WHERE
  -- Replace date range.
  TABLESUFFIX BETWEEN '20201101' AND '20210131'
  AND eventname IN ('addto_cart')
GROUP BY
  1, 2
ORDER BY
  user_count DESC
LIMIT 10;

With BigQuery and GA4, businesses have the potential to gain full access to unsampled data across various fields on an ongoing basis. With the subsequent ability to easily pinpoint key data areas and visualise with ease, marrying up your GA4 data with your Search Console data using BigQuery is really something you should be considering as part of your data and analytics workflows.

How else can we utilise BigQuery to automate SEO insights?

The linking of BigQuery to both Google Search Console and Google Analytics 4 is a huge opportunity to supercharge analytics and SEO reporting insights. 

Given BigQuery’s ability to easily export and visualise data in bulk, there are many other areas and workflows of SEO that businesses (particularly those with huge datasets) can look to speed up using the API.

Server log analysis 

Log file analysis is an often-overlooked part of SEO, though can yield valuable insights in terms of how Google is crawling and interacting with certain pages across your website. This can be particularly insightful for large businesses with big URL inventories who are looking to assess ways to optimise crawl budget or wish to gain insights in how Googlebot visits commercially valuable sections of the website, for example.

Due to security or compliance-related reasons, raw log file data can be hard to get hold of. Google Search Console’s crawl stat report offers some respite to this blocker, however will only allow you access to 1,000 rows of data. Syncing BigQuery with Search Console will allow you to explore every Googlebot visit beyond the 1,000 row limitation and visualise it accordingly. 

Keyword ranking tracking at scale

Using the average position data gained from Google Search Console can help you get a view on your keyword visibility, however there are some limits insofar as the granularity of keyword data that Search Console provides. For example search volumes, exact ranking positions, keyword grouping and SERP features are all something third party keyword rank trackers do at scale, and tools such as Nozzle offer BigQuery integration. 

Enterprise SEO and crawling tools 

Some large-scale crawling tools such as Botify offer integration with BigQuery’s data warehouse, thus enabling the ability to process continuous large amounts of data across technical SEO, content research and competitor analysis via the cloud. Harnessing the relevant SQL commands will allow you to readily extract useful data points across a number of these areas and visualise accordingly to suit your needs. 

Potentially anything?

We’ve largely covered what tools (Google or otherwise) offer integrations into BigQuery, allowing ease of data processing. 

However, BigQuery allows you to create new datasets from scratch and upload data exports from various first-hand sources. If you’ve got data from a site-wide Screaming Frog crawl for example, or indeed a large export on backlink or internal linking data across your site, there are methods to load this data into BigQuery natively and go from there in terms of processing it with SQL. The potential to glean insights at scale and with a good degree of automation and quick visualisation is huge.

Get to grips with BigQuery now

We should all be aware that the looming deadline for the sunset of Universal Analytics is around the corner (July 1st, 2023). 

While some have struggled with its successor GA4, the opening up of BigQuery with the platform (as well as with Search Console) for all does serve as an impetus for businesses to get familiar with the data warehouse sooner rather than later.

Given the sheer depth of data available and ease of visualisation, it’s likely that we may see further integrations to BigQuery across other well-known marketing platforms (Google-owned or otherwise) so now is the time to start using it.