This feature is only available in Analytics 360, part of Google Marketing Platform. Learn more about Google Marketing Platform. |
This article contains examples of how to construct queries of the Analytics data you export to BigQuery. We have made available a sample dataset so you can practice with some of the queries in this article.
In this article:- Query optimization
- Tips and best practices
- Basic query examples
- Advanced query examples
- Products purchased by customers who purchased Product A (Classic Ecommerce)
- Products purchased by customers who purchased product A (Enhanced Ecommerce)
- Average number of user interactions before purchase
- Percentage of stock sold per product
- Profitability of each product
- Real profitability of each product
Query optimization
Every query you run contributes to your monthly data-processing allowance. If you select extraneous fields, you increase the amount of data that needs to be processed, and as a result, you use more of your monthly allowance than necessary. Optimised queries make efficient use of your monthly data-processing allowance.
Learn more about pricing.
Select only what you need
When you formulate a query, select the relevant fields within the SELECT statement. By not calling extraneous fields, you reduce the amount of data, and the time needed to process the query.
Example: avoid the use of the wildcard operator
Bad form: using wildcard operator |
---|
SELECT * |
Better form: using field names to avoid unnecessary processing |
---|
SELECT field1, field2 |
Permit caching
Where possible, avoid using functions as fields. Functions (such as NOW()
or TODAY()
) return variable results, which prevents queries from being cached and therefore returned more quickly. Instead, use specific times and dates.
Use intermediate tables for commonly used subqueries
If you find that you are repeatedly using a specific query as a subquery, you can save that query as an intermediate table by clicking Save as Table above the query results. You can then reference that table in the FROM
section of your query, which will reduce both the amount of data that has to be processed and time required for processing.
using an intermediate table |
---|
SELECT field1, field2 |
Query debugging
BigQuery debugs your code as you construct it. In the composition window, debugging is indicated just below the query. Debugging is also available through the API with dryRun flag.
Valid queries have a green indicator that you can click to see the amount of data processed by the query. This feature gives you an opportunity to optimize your data before you run the query so that you can avoid unnecessary data processing.
Invalid queries have a red indicator that you can click to see information about the error, and find the line and column where the error is occurring. In the example below, the GROUP BY statement is blank, and the error is pinpointed.
Tips and best practices
Using the sample dataset
The following examples use the Google Analytics sample dataset.
To use the queries on your own data, simply replace the project and dataset names in the examples with your own project and dataset names.
Using Standard SQL vs. Legacy SQL
BigQuery supports two SQL dialects:
Migrating to Standard SQL explains the differences between the two dialects.
Standard SQL is now the preferred SQL dialect for querying data stored in BigQuery.
See Enabling Standard SQL for information about enabling Standard SQL in the BigQuery UI, CLI, API, or whichever interface you are using.
The easiest way to get started is to include the comment "standardSQL" at the top of your Standard SQL queries as shown in the following examples.
With Legacy SQL, Google Analytics 360 data is passed into a new table everyday. To query multiple tables at once, you can comma separate the table names, use the TABLE_DATE_RANGE
table wildcard function, or use multiple comma-separated TABLE_DATE_RANGE
functions, as in the following examples.
Query multiple tables
The following examples show you Standard SQL and Legacy SQL queries for the same data.
3 days
Standard SQL
3 Days using UNION ALL |
---|
#standardSQL |
Legacy SQL
3 days using comma-separated table names |
---|
SELECT |
Last 1095 Days
Standard SQL
Last 1095 Days using _TABLE_SUFFIX |
---|
#standardSQL |
Legacy SQL
Last 1095 days using TABLE_DATE_RANGE |
---|
SELECT |
Last 36 Months
Standard SQL
Last 36 Months using _TABLE_SUFFIX |
---|
#standardSQL |
Legacy SQL
Last 36 months using TABLE_DATE_RANGE |
---|
SELECT |
Last 3 years
Standard SQL
Last 3 Years using _TABLE_SUFFIX |
---|
#standardSQL |
Legacy SQL
Last 3 years using TABLE_DATE_RANGE |
---|
SELECT |
Specific date range
Standard SQL
Specific date range using _TABLE_SUFFIX |
---|
#standardSQL |
Legacy SQL
Specific date range using TABLE_DATE_RANGE |
---|
SELECT |
Last 3 years plus today’s data (intraday)
Standard SQL
Last 3 years plus today’s data (intraday) using UNION ALL & _TABLE_SUFFIX |
---|
Note: this example query will not work with the Google Analytics public dataset because there is currently no intraday table. |
#standardSQL |
Legacy SQL
Last 3 years plus today’s data (intraday) using multiple TABLE_DATE_RANGE |
---|
Note: this example query will not work with the Google Analytics public dataset because there is currently no intraday table. |
SELECT |
Basic query examples
This section explains how to construct basic queries using metrics and dimensions from example Analytics data.
Total [metric] per [dimension]?
Below are example scripts for the question: What is the total number of transactions generated per device browser in July 2017?
Standard SQL
Total transactions per device browser in July 2017 |
---|
#standardSQL |
Legacy SQL
Total transactions per device browser in July 2017 |
---|
SELECT |
Average bounce rate per [ dimension ]?
The real bounce rate is defined as the percentage of visits with a single pageview. Below are example scripts for the question: What was the real bounce rate per traffic source?
Standard SQL
Bounce rate per traffic source in July 2017 |
---|
#standardSQL |
Legacy SQL
Bounce rate per traffic source in July 2017 |
---|
SELECT |
Average number of product pageviews by purchaser type (purchasers vs non-purchasers)
Below are example scripts for the question: What was the average number of product pageviews for users who made a purchase in July 2017?
Standard SQL
Average number of product pageviews for users who made a purchase in July 2017 |
---|
#standardSQL |
Legacy SQL
Average number of product pageviews for users who made a purchase in July 2017 |
---|
SELECT |
Below are example scripts for the question: What was the average number of product pageviews for users who did not make a purchase in July 2017?
Standard SQL
Average number of product pageviews for users who did not make a purchase in July 2017 |
---|
#standardSQL |
Legacy SQL
Average number of product pageviews for users who did not make a purchase in July 2017 |
---|
SELECT |
Average number of transactions per purchaser
Below are example scripts for the question: What was the average total transactions per user that made a purchase in July 2017?
Standard SQL
Average number of transactions per user that made a purchase in July 2017 |
---|
#standardSQL |
Legacy SQL
Average number of transactions per user that made a purchase in July 2017 |
---|
SELECT |
Average amount of money spent per session
Below are example scripts for the question: What is the average amount of money spent per session in July 2017?
Standard SQL
Average amount of money spent per session in July 2017 |
---|
#standardSQL |
Legacy SQL
Average amount of money spent per session in July 2017 |
---|
SELECT |
Sequence of hits
Below are example scripts for the question: What is the sequence of pages viewed?.
Standard SQL
Sequence of pages viewed by users in July 2017 |
---|
#standardSQL |
Legacy SQL
Sequence of pages viewed by users in July 2017 |
---|
SELECT |
In this query, you limit the hit types to PAGES
to avoid seeing event or transaction interactions. Each line of the output represents a pageview, and is shown by the default order of the fields in the SELECT
statement.
Multiple custom dimensions at the hit or session level
custom dimension at the hit level |
---|
SELECT fullVisitorId, visitId, hits.hitNumber, hits.time, |
custom dimension at the session level |
---|
SELECT fullVisitorId, visitId, |
In each query:
The SELECT
statement queries for the relevant dimension and metric fields.
The MAX
function:
- Returns the custom dimension as a new column. You can repeat the function to return multiple custom dimensions as new columns.
WITHIN hits
andWITHIN RECORD
evaluate the condition inside repeated fields in BigQuery.- The condition inside
MAX
is evaluated for each custom dimension, but for any that are notindex=1 (hits)
orindex=2 (sessions)
, it returnsNULL
. - Returns the maximum value, which is the value of Custom Dimension 1 for hits or Custom Dimension 2 for sessions since all other values are
NULL
.
Advanced query examples
Now that you’re familiar with simple queries, you can construct queries using the advanced functions and features available in BIgQuery.
Products purchased by customers who purchased product A (Classic Ecommerce)
Below is a skeleton script for the question: What other products are purchased by customers who purchase product A?
products purchased by a customer who purchases product A (Classic Ecommerce) |
---|
SELECT hits.item.productName AS other_purchased_products, COUNT(hits.item.productName) AS quantity |
- In the first line, you select all of the other items purchased by a user, and the aggregate function
COUNT()
is used to calculate the quantity of each other item purchased. The result is then displayed in a field labelledquantity
, with their associated item in the product field labelled asother_purchased_products
. - In the gray subquery, you select only the unique users (
fullVisitorId
) who have carried out transactions (totals.transactions>=1
) and during a transaction purchased product A (WHERE hits.item.productName CONTAINS 'Product Item Name A'
).
The rules (WHERE
and AND
statements) in the top-level (green) query disregard values in hits.item.productName
which are null and contain product A.
Here is an example of the query If a customer purchases Brighton Metallic Pens - Set of 4, what other product(s) did they purchase?
products purchased by a customer who purchases 'Brighton Metallic Pens (Set of 4)' on 24th June 2013 |
---|
SELECT hits.item.productName AS other_purchased_products, COUNT(hits.item.productName) AS quantity |
In Dremel/BigQuery, using WHERE expr IN
triggers a JOIN, and size restrictions apply; specifically, the size of the right side of the JOIN (in this case the number of visitors) needs to be less than 8 MB. In Dremel, this is called a broadcast JOIN. When the size exceeds 8 MB, you need to trigger a shuffled JOIN, which can be done using the JOIN EACH syntax. Unfortunately, it can't be done using IN, but the same query can be rewritten with a JOIN.
Products purchased by customers who purchased product A (Enhanced Ecommerce)
This is the similar to the previous skeleton query but works for Enhanced Ecommerce. It also makes use of TABLE_DATE_RANGE
to query data over multiple days.
products purchased by a customer who purchases product A (Enhanced Ecommerce) |
---|
SELECT hits.product.productSKU AS other_purchased_products, COUNT(hits.product.productSKU) AS quantity |
Average number of user interactions before purchase
This is an example of a query of a JOIN() [...] ON
command, which is only dependent on Analytics data.
Below is a skeleton script for the question: What is the average number of user interactions before a purchase?
number of user interactions before a purchase |
---|
SELECT one.hits.item.productSku AS ProductSku, ( sum_of_hit_number / total_hits ) AS avg_hit_number |
- The first line carries out the main mathematical operation to find the average user interactions per product, and this query creates a join between two subqueries called ‘Alias_Name_1’ and ‘Alias_Name_2’.
- ‘Alias_Name_1’ is used to produce a field that uses the aggregate function
SUM()
to sum all the hit numbers recorded for a product. - ‘Alias_Name_2’ is used to find the number of hits made by users per product, using the
COUNT()
function. - The last line shows the common field (
hits.item.productSku
) shared between the two datasets on the join.
Here is an example of the query On 10th September 2013, what is the average number of user interactions before a purchase?
number of user interactions on 10th September 2013 before a purchase |
---|
SELECT one.hits.item.productSku AS ProductSku, ( sum_of_hit_number / total_hits ) AS avg_hit_number |
Percentage of stock sold per product
This is an example of a query which is not only dependent on Analytics data, but also on non-Analytics data. By combining both data sets, you can begin to understand user behavior at a more segmented level. You can import non-Analytics data into BigQuery, but keep in mind that this will contribute to your monthly data storage charge.
Below is a skeleton script for the question: What percentage of stock was sold per product?
percentage of stock sold per product |
---|
SELECT AnalyticsImport.product_data_20130728.productId, ((( one.quantity_sold ) / ( AnalyticsImport.product_data_20130728.productstock + one.quantity_sold )) * 100 ) AS percentage_of_stock_sold |
- The first line outputs two fields: one field containing all the product IDs, and one that is a mathematical operation that shows the percentage of stock sold for that product ID.
- Since this query depends on two data sets, you need to use the
JOIN() ... ON
function. This command joins the rows from the two data sets based on the common field between them. In this case, the two data sets are[ ‘Imported_DataSet’ ]
and‘Alias_Name’
. [ ‘Imported_DataSet’ ]
is the non-Analytics data. This is the data set that contains the metric field for how much stock is left (Imported DataSet.’stock_left_field’
) and the product-ID dimension field (Imported_DataSet.’productId_field’
).‘Alias_Name’
is the name assigned to the data returned by the gray subquery. This subquery is using Analytics data to find out the total quantity of items sold per product.- The last line uses the
ON
statement to show the common field between the two data sets and where the two datasets are joined.
Many of the variables in this query have their dataset name attached to them as prefixes (e.g., Imported_DataSet.’productId_field’, Alias_Name.quantity_sold
). This is to clarify what field you are selecting, and to make it explicit to which dataset it belongs.
Here is an example of the query What percentage of stock was sold per product on the 28th July 2013?
percentage of stock sold per product on 28th July 2013 |
---|
SELECT AnalyticsImport.product_data_20130728.productId, ( ( ( one.quantity_sold ) / ( AnalyticsImport.product_data_20130728.productstock + one.quantity_sold ) ) * 100 ) AS percentage_of_stock_sold |
Profitability of each product
Below is a skeleton script for the question: What is the profitability of each product?
profit by product |
---|
SELECT Alias_Name.hits.item.productSku, ( Imported_DataSet.’product profit field’ * Alias_Name.quantity ) AS profit |
- The first line contains the mathematical operation to calculate the total profit made on each product.
- The gray subquery uses non-Analytics data, which collects the data about how much profit is made when a product is sold.
- The red subquery is the Analytics data subquery, which will be joined with the non-Analytics data. It calculates the quantity of items sold per product.
- The last line uses the
ON
statement to clarify the field that the two data sets share. In this case, it is the product id number.
Here is an example of the query What was the profitability of each product on 28th July 2013?
profit by product on 28th July 2013 |
---|
SELECT two.hits.item.productSku, ((AnalyticsImport.product_data_20130728.productprice-AnalyticsImport.product_data_20130728.productcost)*two.quantity) AS profit |
Profit is calculated by finding the difference between the price the product is sold at and the cost of manufacturing it. This information is stored on the non-GA dataset.
Real profitability of each product (taking into account refunds)
Below is a skeleton script for the question: What is the real profitability of each product?
real profit by product |
---|
SELECT Alias_Name.hits.item.productSku, (gross_profit-total_refund_revenue) AS real_profit |
- This is a very similar query to What is the profitability of each product? The only differences are in the non-Analytics dataset in the gray subquery and in the mathematical operation calculating the real profit in the first line.
- In the non-Analytics dataset, you are also calculating the total amount of money spent on refunds (in the
SELECT
statement of the red subquery). - Then you carry out a mathematical operation in line 1 to find the real profit by subtracting the revenue spent on refunds from your gross profit.
For more information on the query, please see the section on the profitability of each product.
Here is an example of the following query What was the real profitability of each product on the 28th July 2013?
real profit by product on 28th July 2013 |
---|
SELECT two.hits.item.productSku, (gross_profit-total_refund_revenue) AS real_profit |
The real profit takes into account the profitability of a product after taking refunded products into consideration. To calculate the total refund revenue for a product:
total refund revenue for a product = ( the price of the product + the refund delivery price for the product ) * the quantity of products refunded