[UA] BigQuery cookbook [Legacy]

You are viewing a legacy article about Universal Analytics. Learn more about Google Analytics 4 replacing Universal Analytics.

This feature is only available in Analytics 360, part of Google Marketing Platform.
Learn more about Google Marketing Platform.

This article contains examples of how to construct queries of the Analytics data you export to BigQuery. We have made available a sample dataset so you can practice with some of the queries in this article.

In this article:

Query optimization
Tips and best practices
Basic query examples
Advanced query examples

Query optimization

Every query you run contributes to your monthly data-processing allowance. If you select extraneous fields, you increase the amount of data that needs to be processed, and as a result, you use more of your monthly allowance than necessary. Optimised queries make efficient use of your monthly data-processing allowance.

Learn more about pricing.

Select only what you need

When you formulate a query, select the relevant fields within the SELECT statement. By not calling extraneous fields, you reduce the amount of data, and the time needed to process the query.

Example: avoid the use of the wildcard operator

Bad form: using wildcard operator
`SELECT * FROM [table name];`

Better form: using field names to avoid unnecessary processing
`SELECT field1, field2 FROM [table name];`

Permit caching

Where possible, avoid using functions as fields. Functions (such as NOW() or TODAY()) return variable results, which prevents queries from being cached and therefore returned more quickly. Instead, use specific times and dates.

Currently, cached results are not supported for queries against multiple tables using a wildcard even if the Use Cached Results option is checked. If you run the same wildcard query multiple times, you are billed for each query. Learn more

Use intermediate tables for commonly used subqueries

If you find that you are repeatedly using a specific query as a subquery, you can save that query as an intermediate table by clicking Save as Table above the query results. You can then reference that table in the FROM section of your query, which will reduce both the amount of data that has to be processed and time required for processing.

using an intermediate table
`SELECT field1, field2 FROM [Dataset name.table name];`

Query debugging

BigQuery debugs your code as you construct it. In the composition window, debugging is indicated just below the query. Debugging is also available through the API with dryRun flag.

Valid queries have a green indicator that you can click to see the amount of data processed by the query. This feature gives you an opportunity to optimize your data before you run the query so that you can avoid unnecessary data processing.

Query Debugging - Success

Invalid queries have a red indicator that you can click to see information about the error, and find the line and column where the error is occurring. In the example below, the GROUP BY statement is blank, and the error is pinpointed.

Query Debugging - Error

Tips and best practices

Using the sample dataset

The following examples use the Google Analytics sample dataset.

To use the queries on your own data, simply replace the project and dataset names in the examples with your own project and dataset names.

Using Standard SQL vs. Legacy SQL

BigQuery supports two SQL dialects:

Migrating to Standard SQL explains the differences between the two dialects.

Standard SQL is now the preferred SQL dialect for querying data stored in BigQuery.

See Enabling Standard SQL for information about enabling Standard SQL in the BigQuery UI, CLI, API, or whichever interface you are using.

The easiest way to get started is to include the comment "standardSQL" at the top of your Standard SQL queries as shown in the following examples.

With Legacy SQL, Google Analytics 360 data is passed into a new table everyday. To query multiple tables at once, you can comma separate the table names, use the TABLE_DATE_RANGE table wildcard function, or use multiple comma-separated TABLE_DATE_RANGE functions, as in the following examples.

Query multiple tables

The following examples show you Standard SQL and Legacy SQL queries for the same data.

3 days

Standard SQL

3 Days using UNION ALL

3 Days using UNION ALL
#standardSQL WITH ga_tables AS ( SELECT date, SUM(totals.visits) AS visits, SUM(totals.pageviews) AS pageviews, SUM(totals.transactions) AS transactions, SUM(totals.transactionRevenue)/1000000 AS revenue FROM `bigquery-public-data.google_analytics_sample.ga_sessions_20160801` GROUP BY date UNION ALL SELECT date, SUM(totals.visits) AS visits, SUM(totals.pageviews) AS pageviews, SUM(totals.transactions) AS transactions, SUM(totals.transactionRevenue)/1000000 AS revenue FROM `bigquery-public-data.google_analytics_sample.ga_sessions_20160802` GROUP BY date UNION ALL SELECT date, SUM(totals.visits) AS visits, SUM(totals.pageviews) AS pageviews, SUM(totals.transactions) AS transactions, SUM(totals.transactionRevenue)/1000000 AS revenue FROM `bigquery-public-data.google_analytics_sample.ga_sessions_20160803` GROUP BY date ) SELECT date, visits, pageviews, transactions, revenue, FROM ga_tables ORDER BY date ASC

#standardSQL

      WITH ga_tables AS (

      SELECT

      date,

      SUM(totals.visits) AS visits,

      SUM(totals.pageviews) AS pageviews,

      SUM(totals.transactions) AS transactions,

      SUM(totals.transactionRevenue)/1000000 AS revenue

      FROM `bigquery-public-data.google_analytics_sample.ga_sessions_20160801`

      GROUP BY date

      

      UNION ALL

      

      SELECT

      date,

      SUM(totals.visits) AS visits,

      SUM(totals.pageviews) AS pageviews,

      SUM(totals.transactions) AS transactions,

      SUM(totals.transactionRevenue)/1000000 AS revenue

      FROM `bigquery-public-data.google_analytics_sample.ga_sessions_20160802`

      GROUP BY date

      

      UNION ALL

      

      SELECT

      date,

      SUM(totals.visits) AS visits,

      SUM(totals.pageviews) AS pageviews,

      SUM(totals.transactions) AS transactions,

      SUM(totals.transactionRevenue)/1000000 AS revenue

      FROM `bigquery-public-data.google_analytics_sample.ga_sessions_20160803`

      GROUP BY date

      

      )

      SELECT

      date,

      visits,

      pageviews,

      transactions,

      revenue,

      FROM ga_tables

      ORDER BY date ASC

Legacy SQL

3 days using comma-separated table names

3 days using comma-separated table names
`SELECT date, SUM(totals.visits) AS visits, SUM(totals.pageviews) AS pageviews, SUM(totals.transactions) AS transactions, SUM(totals.transactionRevenue)/1000000 AS revenue FROM [bigquery-public-data.google_analytics_sample.ga_sessions_20160801], [bigquery-public-data.google_analytics_sample.ga_sessions_20160802], [bigquery-public-data.google_analytics_sample.ga_sessions_20160803] GROUP BY date ORDER BY date ASC`

SELECT

      date,

      SUM(totals.visits) AS visits,

      SUM(totals.pageviews) AS pageviews,

      SUM(totals.transactions) AS transactions,

      SUM(totals.transactionRevenue)/1000000 AS revenue

      FROM

      [bigquery-public-data.google_analytics_sample.ga_sessions_20160801],

      [bigquery-public-data.google_analytics_sample.ga_sessions_20160802],

      [bigquery-public-data.google_analytics_sample.ga_sessions_20160803]

      GROUP BY

      date

      ORDER BY

      date ASC

Last 1095 Days

Standard SQL

Last 1095 Days using _TABLE_SUFFIX

Last 1095 Days using _TABLE_SUFFIX
#standardSQL SELECT date, SUM(totals.visits) AS visits, SUM(totals.pageviews) AS pageviews, SUM(totals.transactions) AS transactions, SUM(totals.transactionRevenue)/1000000 AS revenue FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*` WHERE _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1095 DAY)) AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) GROUP BY date ORDER BY date ASC

#standardSQL

      SELECT

      date,

      SUM(totals.visits) AS visits,

      SUM(totals.pageviews) AS pageviews,

      SUM(totals.transactions) AS transactions,

      SUM(totals.transactionRevenue)/1000000 AS revenue

      FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*`

      WHERE

      _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1095 DAY))

      AND

      FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))

      GROUP BY date

      ORDER BY date ASC

Legacy SQL

Last 1095 days using TABLE_DATE_RANGE
`SELECT date, SUM(totals.visits) AS visits, SUM(totals.pageviews) AS pageviews, SUM(totals.transactions) AS transactions, SUM(totals.transactionRevenue)/1000000 AS revenue FROM (TABLE_DATE_RANGE([bigquery-public-data.google_analytics_sample.ga_sessions_], DATE_ADD(CURRENT_TIMESTAMP(), -1095, 'DAY'), DATE_ADD(CURRENT_TIMESTAMP(), -1, 'DAY'))) GROUP BY date ORDER BY date ASC`

Last 36 Months

Standard SQL

Last 36 Months using _TABLE_SUFFIX

Last 36 Months using _TABLE_SUFFIX
#standardSQL SELECT date, SUM(totals.visits) AS visits, SUM(totals.pageviews) AS pageviews, SUM(totals.transactions) AS transactions, SUM(totals.transactionRevenue)/1000000 AS revenue FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*` WHERE _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 36 MONTH)) AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) GROUP BY date ORDER BY date ASC

#standardSQL

      SELECT

      date,

      SUM(totals.visits) AS visits,

      SUM(totals.pageviews) AS pageviews,

      SUM(totals.transactions) AS transactions,

      SUM(totals.transactionRevenue)/1000000 AS revenue

      FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*`

      WHERE

      _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 36 MONTH))

      AND

      FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))

      GROUP BY date

      ORDER BY date ASC

Legacy SQL

Last 36 months using TABLE_DATE_RANGE
`SELECT date, SUM(totals.visits) AS visits, SUM(totals.pageviews) AS pageviews, SUM(totals.transactions) AS transactions, SUM(totals.transactionRevenue)/1000000 AS revenue FROM (TABLE_DATE_RANGE([bigquery-public-data.google_analytics_sample.ga_sessions_], DATE_ADD(CURRENT_TIMESTAMP(), -36, 'MONTH'), DATE_ADD(CURRENT_TIMESTAMP(), -1, 'DAY'))) GROUP BY date ORDER BY date ASC`

Last 3 years

Standard SQL

Last 3 Years using _TABLE_SUFFIX

Last 3 Years using _TABLE_SUFFIX
#standardSQL SELECT date, SUM(totals.visits) AS visits, SUM(totals.pageviews) AS pageviews, SUM(totals.transactions) AS transactions, SUM(totals.transactionRevenue)/1000000 AS revenue FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*` WHERE _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 3 YEAR)) AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) GROUP BY date ORDER BY date ASC

#standardSQL

      SELECT

      date,

      SUM(totals.visits) AS visits,

      SUM(totals.pageviews) AS pageviews,

      SUM(totals.transactions) AS transactions,

      SUM(totals.transactionRevenue)/1000000 AS revenue

      FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*`

      WHERE

      _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 3 YEAR))

      AND

      FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))

      GROUP BY date

      ORDER BY date ASC

Legacy SQL

Last 3 years using TABLE_DATE_RANGE
`SELECT date, SUM(totals.visits) AS visits, SUM(totals.pageviews) AS pageviews, SUM(totals.transactions) AS transactions, SUM(totals.transactionRevenue)/1000000 AS revenue FROM (TABLE_DATE_RANGE([bigquery-public-data.google_analytics_sample.ga_sessions_], DATE_ADD(CURRENT_TIMESTAMP(), -3, 'YEAR'), DATE_ADD(CURRENT_TIMESTAMP(), -1, 'DAY'))) GROUP BY date ORDER BY date ASC`

Specific date range

Standard SQL

Specific date range using _TABLE_SUFFIX
#standardSQL SELECT date, SUM(totals.visits) AS visits, SUM(totals.pageviews) AS pageviews, SUM(totals.transactions) AS transactions, SUM(totals.transactionRevenue)/1000000 AS revenue FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*` WHERE _TABLE_SUFFIX BETWEEN '20160801' AND '20170731' GROUP BY date ORDER BY date ASC

Legacy SQL

Specific date range using TABLE_DATE_RANGE
`SELECT date, SUM(totals.visits) AS visits, SUM(totals.pageviews) AS pageviews, SUM(totals.transactions) AS transactions, SUM(totals.transactionRevenue)/1000000 AS revenue FROM (TABLE_DATE_RANGE([bigquery-public-data.google_analytics_sample.ga_sessions_], TIMESTAMP('2016-08-01'), TIMESTAMP('2017-07-31'))) GROUP BY date ORDER BY date ASC`

Last 3 years plus today’s data (intraday)

Standard SQL

Last 3 years plus today’s data (intraday) using UNION ALL & _TABLE_SUFFIX

Note: this example query will not work with the Google Analytics public dataset because there is currently no intraday table.

Last 3 years plus today’s data (intraday) using UNION ALL & _TABLE_SUFFIX
#standardSQL WITH ga_tables AS ( SELECT date, SUM(totals.visits) AS visits, SUM(totals.pageviews) AS pageviews, SUM(totals.transactions) AS transactions, SUM(totals.transactionRevenue)/1000000 AS revenue FROM `bigquery-public-data.google_analytics_sample.ga_sessions_` WHERE _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 3 YEAR)) AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) GROUP BY date UNION ALL SELECT date, SUM(totals.visits) AS visits, SUM(totals.pageviews) AS pageviews, SUM(totals.transactions) AS transactions, SUM(totals.transactionRevenue)/1000000 AS revenue FROM `bigquery-public-data.google_analytics_sample.ga_sessions_` WHERE _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 0 DAY)) GROUP BY date ) SELECT date, visits, pageviews, transactions, revenue, FROM ga_tables ORDER BY date ASC

#standardSQL

      WITH ga_tables AS ( SELECT

      date,

      SUM(totals.visits) AS visits,

      SUM(totals.pageviews) AS pageviews,

      SUM(totals.transactions) AS transactions,

      SUM(totals.transactionRevenue)/1000000 AS revenue

      FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*`

      WHERE

      _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 3 YEAR))

      AND

      FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))

      GROUP BY date

      UNION ALL

      

      SELECT

      date,

      SUM(totals.visits) AS visits,

      SUM(totals.pageviews) AS pageviews,

      SUM(totals.transactions) AS transactions,

      SUM(totals.transactionRevenue)/1000000 AS revenue

      FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*`

      WHERE

      _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))

      AND

      FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 0 DAY))

      GROUP BY date

      )

      SELECT

      date,

      visits,

      pageviews,

      transactions,

      revenue,

      FROM ga_tables

      ORDER BY date ASC

Legacy SQL

Last 3 years plus today’s data (intraday) using multiple TABLE_DATE_RANGE

Note: this example query will not work with the Google Analytics public dataset because there is currently no intraday table.

Last 3 years plus today’s data (intraday) using multiple TABLE_DATE_RANGE
SELECT date, SUM(totals.visits) AS visits, SUM(totals.pageviews) AS pageviews, SUM(totals.transactions) AS transactions, SUM(totals.transactionRevenue)/1000000 AS revenue FROM (TABLE_DATE_RANGE([bigquery-public-data.google_analytics_sample.ga_sessions_], DATE_ADD(CURRENT_TIMESTAMP(), -3, 'YEAR'), DATE_ADD(CURRENT_TIMESTAMP(), -1, 'DAY'))), (TABLE_DATE_RANGE([bigquery-public-data.google_analytics_sample.ga_sessions_intraday_], DATE_ADD(CURRENT_TIMESTAMP(), -1, 'DAY'), CURRENT_TIMESTAMP())) GROUP BY date ORDER BY date ASC

SELECT

      date,

      SUM(totals.visits) AS visits,

      SUM(totals.pageviews) AS pageviews,

      SUM(totals.transactions) AS transactions,

      SUM(totals.transactionRevenue)/1000000 AS revenue

      FROM

      (TABLE_DATE_RANGE([bigquery-public-data.google_analytics_sample.ga_sessions_],

      DATE_ADD(CURRENT_TIMESTAMP(), -3, 'YEAR'), DATE_ADD(CURRENT_TIMESTAMP(), -1, 'DAY'))),

      (TABLE_DATE_RANGE([bigquery-public-data.google_analytics_sample.ga_sessions_intraday_],

      DATE_ADD(CURRENT_TIMESTAMP(), -1, 'DAY'), CURRENT_TIMESTAMP()))

      GROUP BY

      date

      ORDER BY

      date ASC

Basic query examples

This section explains how to construct basic queries using metrics and dimensions from example Analytics data.

Total [metric] per [dimension]?

Below are example scripts for the question: What is the total number of transactions generated per device browser in July 2017?

Standard SQL

Total transactions per device browser in July 2017
#standardSQL SELECT device.browser, SUM ( totals.transactions ) AS total_transactions FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*` WHERE _TABLE_SUFFIX BETWEEN '20170701' AND '20170731' GROUP BY device.browser ORDER BY total_transactions DESC

Legacy SQL

Total transactions per device browser in July 2017
`SELECT device.browser, SUM ( totals.transactions ) AS total_transactions FROM TABLE_DATE_RANGE( [bigquery-public-data.google_analytics_sample.ga_sessions_], TIMESTAMP('2017-07-01'), TIMESTAMP('2017-07-31') ) GROUP BY device.browser ORDER BY total_transactions DESC`

Average bounce rate per [ dimension ]?

The real bounce rate is defined as the percentage of visits with a single pageview. Below are example scripts for the question: What was the real bounce rate per traffic source?

Standard SQL

Bounce rate per traffic source in July 2017

Bounce rate per traffic source in July 2017
#standardSQL SELECT source, total_visits, total_no_of_bounces, ( ( total_no_of_bounces / total_visits ) * 100 ) AS bounce_rate FROM ( SELECT trafficSource.source AS source, COUNT ( trafficSource.source ) AS total_visits, SUM ( totals.bounces ) AS total_no_of_bounces FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*` WHERE _TABLE_SUFFIX BETWEEN '20170701' AND '20170731' GROUP BY source ) ORDER BY total_visits DESC

#standardSQL

      SELECT

      source,

      total_visits,

      total_no_of_bounces,

      ( ( total_no_of_bounces / total_visits ) * 100 ) AS bounce_rate

      FROM (

      SELECT

      trafficSource.source AS source,

      COUNT ( trafficSource.source ) AS total_visits,

      SUM ( totals.bounces ) AS total_no_of_bounces

      FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*`

      WHERE

      _TABLE_SUFFIX BETWEEN '20170701' AND '20170731'

      GROUP BY

      source )

      ORDER BY

      total_visits DESC

Legacy SQL

Bounce rate per traffic source in July 2017

Bounce rate per traffic source in July 2017
`SELECT source, total_visits, total_no_of_bounces, ( ( total_no_of_bounces / total_visits ) * 100 ) AS bounce_rate FROM ( SELECT trafficSource.source AS source, COUNT ( trafficSource.source ) AS total_visits, SUM ( totals.bounces ) AS total_no_of_bounces FROM TABLE_DATE_RANGE( [bigquery-public-data.google_analytics_sample.ga_sessions_], TIMESTAMP('2017-07-01'), TIMESTAMP('2017-07-31') ) GROUP BY source ) ORDER BY total_visits DESC`

SELECT

      source,

      total_visits,

      total_no_of_bounces,

      ( ( total_no_of_bounces / total_visits ) * 100 ) AS bounce_rate

      FROM (

      SELECT

      trafficSource.source AS source,

      COUNT ( trafficSource.source ) AS total_visits,

      SUM ( totals.bounces ) AS total_no_of_bounces

      FROM

      TABLE_DATE_RANGE( [bigquery-public-data.google_analytics_sample.ga_sessions_],

      TIMESTAMP('2017-07-01'), TIMESTAMP('2017-07-31') )

      GROUP BY

      source )

      ORDER BY

      total_visits DESC

Average number of product pageviews by purchaser type (purchasers vs non-purchasers)

Below are example scripts for the question: What was the average number of product pageviews for users who made a purchase in July 2017?

Standard SQL

Average number of product pageviews for users who made a purchase in July 2017
#standardSQL SELECT ( SUM(total_pagesviews_per_user) / COUNT(users) ) AS avg_pageviews_per_user FROM ( SELECT fullVisitorId AS users, SUM(totals.pageviews) AS total_pagesviews_per_user FROM`bigquery-public-data.google_analytics_sample.ga_sessions_*` WHERE _TABLE_SUFFIX BETWEEN '20170701' AND '20170731' AND totals.transactions >=1 GROUP BY users )

Legacy SQL

Average number of product pageviews for users who made a purchase in July 2017
`SELECT ( SUM(total_pagesviews_per_user) / COUNT(users) ) AS avg_pageviews_per_user FROM ( SELECT fullVisitorId AS users, SUM(totals.pageviews) AS total_pagesviews_per_user FROM TABLE_DATE_RANGE( [bigquery-public-data.google_analytics_sample.ga_sessions_], TIMESTAMP('2017-07-01'), TIMESTAMP('2017-07-31') ) WHERE totals.transactions >=1 GROUP BY users )`

Below are example scripts for the question: What was the average number of product pageviews for users who did not make a purchase in July 2017?

Standard SQL

Average number of product pageviews for users who did not make a purchase in July 2017
#standardSQL SELECT ( SUM(total_pagesviews_per_user) / COUNT(users) ) AS avg_pageviews_per_user FROM ( SELECT fullVisitorId AS users, SUM(totals.pageviews) AS total_pagesviews_per_user FROM`bigquery-public-data.google_analytics_sample.ga_sessions_*` WHERE _TABLE_SUFFIX BETWEEN '20170701' AND '20170731' AND totals.transactions IS NULL GROUP BY users )

Legacy SQL

Average number of product pageviews for users who did not make a purchase in July 2017
`SELECT ( SUM(total_pagesviews_per_user) / COUNT(users) ) AS avg_pageviews_per_user FROM ( SELECT fullVisitorId AS users, SUM(totals.pageviews) AS total_pagesviews_per_user FROM TABLE_DATE_RANGE( [bigquery-public-data.google_analytics_sample.ga_sessions_], TIMESTAMP('2017-07-01'), TIMESTAMP('2017-07-31') ) WHERE totals.transactions IS NULL GROUP BY users )`

Average number of transactions per purchaser

Below are example scripts for the question: What was the average total transactions per user that made a purchase in July 2017?

Standard SQL

Average number of transactions per user that made a purchase in July 2017
#standardSQL SELECT (SUM (total_transactions_per_user) / COUNT(fullVisitorId) ) AS avg_total_transactions_per_user FROM ( SELECT fullVisitorId, SUM (totals.transactions) AS total_transactions_per_user FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*` WHERE _TABLE_SUFFIX BETWEEN '20170701' AND '20170731' AND totals.transactions IS NOT NULL GROUP BY fullVisitorId )

Legacy SQL

Average number of transactions per user that made a purchase in July 2017
`SELECT (SUM (total_transactions_per_user) / COUNT(fullVisitorId) ) AS avg_total_transactions_per_user FROM ( SELECT fullVisitorId, SUM (totals.transactions) AS total_transactions_per_user FROM TABLE_DATE_RANGE( [bigquery-public-data.google_analytics_sample.ga_sessions_], TIMESTAMP('2017-07-01'), TIMESTAMP('2017-07-31') ) WHERE totals.transactions IS NOT NULL GROUP BY fullVisitorId )`

Average amount of money spent per session

Below are example scripts for the question: What is the average amount of money spent per session in July 2017?

Standard SQL

Average amount of money spent per session in July 2017

Average amount of money spent per session in July 2017
#standardSQL SELECT ( SUM(total_transactionrevenue_per_user) / SUM(total_visits_per_user) ) AS avg_revenue_by_user_per_visit FROM ( SELECT fullVisitorId, SUM( totals.visits ) AS total_visits_per_user, SUM( totals.transactionRevenue ) AS total_transactionrevenue_per_user FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*` WHERE _TABLE_SUFFIX BETWEEN '20170701' AND '20170731' AND totals.visits > 0 AND totals.transactions >= 1 AND totals.transactionRevenue IS NOT NULL GROUP BY fullVisitorId )

#standardSQL

      SELECT

      ( SUM(total_transactionrevenue_per_user) / SUM(total_visits_per_user) ) AS

      avg_revenue_by_user_per_visit

      FROM (

      SELECT

      fullVisitorId,

      SUM( totals.visits ) AS total_visits_per_user,

      SUM( totals.transactionRevenue ) AS total_transactionrevenue_per_user

      FROM

      `bigquery-public-data.google_analytics_sample.ga_sessions_*`

      WHERE

      _TABLE_SUFFIX BETWEEN '20170701' AND '20170731'

      AND

      totals.visits > 0

      AND totals.transactions >= 1

      AND totals.transactionRevenue IS NOT NULL

      GROUP BY

      fullVisitorId )

Legacy SQL

Average amount of money spent per session in July 2017

Average amount of money spent per session in July 2017
SELECT ( SUM(total_transactionrevenue_per_user) / SUM(total_visits_per_user) ) AS avg_revenue_by_user_per_visit FROM ( SELECT fullVisitorId, SUM( totals.visits ) AS total_visits_per_user, SUM( totals.transactionRevenue ) AS total_transactionrevenue_per_user FROM TABLE_DATE_RANGE( [bigquery-public-data.google_analytics_sample.ga_sessions_], TIMESTAMP('2017-07-01'), TIMESTAMP('2017-07-31') ) WHERE totals.visits > 0 AND totals.transactions >= 1 AND totals.transactionRevenue IS NOT NULL GROUP BY fullVisitorId )

SELECT

      ( SUM(total_transactionrevenue_per_user) / SUM(total_visits_per_user) ) AS

      avg_revenue_by_user_per_visit

      FROM (

      SELECT

      fullVisitorId,

      SUM( totals.visits ) AS total_visits_per_user,

      SUM( totals.transactionRevenue ) AS total_transactionrevenue_per_user

      FROM

      TABLE_DATE_RANGE( [bigquery-public-data.google_analytics_sample.ga_sessions_],

      TIMESTAMP('2017-07-01'), TIMESTAMP('2017-07-31') )

      WHERE

      totals.visits > 0

      AND totals.transactions >= 1

      AND totals.transactionRevenue IS NOT NULL

      GROUP BY

      fullVisitorId )

Sequence of hits

Below are example scripts for the question: What is the sequence of pages viewed?.

Standard SQL

Sequence of pages viewed by users in July 2017
#standardSQL SELECT fullVisitorId, visitId, visitNumber, hits.hitNumber AS hitNumber, hits.page.pagePath AS pagePath FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*`, UNNEST(hits) as hits WHERE _TABLE_SUFFIX BETWEEN '20170701' AND '20170731' AND hits.type="PAGE" ORDER BY fullVisitorId, visitId, visitNumber, hitNumber

Legacy SQL

Sequence of pages viewed by users in July 2017
`SELECT fullVisitorId, visitId, visitNumber, hits.hitNumber AS hitNumber, hits.page.pagePath AS pagePath FROM TABLE_DATE_RANGE( [bigquery-public-data.google_analytics_sample.ga_sessions_], TIMESTAMP('2017-07-01'), TIMESTAMP('2017-07-31') ) WHERE hits.type="PAGE" ORDER BY fullVisitorId, visitId, visitNumber, hitNumber`

In this query, you limit the hit types to PAGES to avoid seeing event or transaction interactions. Each line of the output represents a pageview, and is shown by the default order of the fields in the SELECT statement.

Multiple custom dimensions at the hit or session level

custom dimension at the hit level
`SELECT fullVisitorId, visitId, hits.hitNumber, hits.time, MAX(IF(hits.customDimensions.index=1, hits.customDimensions.value, NULL)) WITHIN hits AS customDimension1, FROM [tableID.ga_sessions_20150305] LIMIT 100`

custom dimension at the session level
`SELECT fullVisitorId, visitId, MAX(IF(customDimensions.index=2, customDimensions.value, NULL)) WITHIN RECORD AS customDimension2, FROM [tableID.ga_sessions_20150305] LIMIT 100`

In each query:

The SELECT statement queries for the relevant dimension and metric fields.

The MAX function:

Returns the custom dimension as a new column. You can repeat the function to return multiple custom dimensions as new columns.
WITHIN hits and WITHIN RECORD evaluate the condition inside repeated fields in BigQuery.
The condition inside MAX is evaluated for each custom dimension, but for any that are not index=1 (hits) or index=2 (sessions), it returns NULL.
Returns the maximum value, which is the value of Custom Dimension 1 for hits or Custom Dimension 2 for sessions since all other values are NULL.

Advanced query examples

Now that you’re familiar with simple queries, you can construct queries using the advanced functions and features available in BIgQuery.

Products purchased by customers who purchased product A (Classic Ecommerce)

Below is a skeleton script for the question: What other products are purchased by customers who purchase product A?

products purchased by a customer who purchases product A (Classic Ecommerce)

products purchased by a customer who purchases product A (Classic Ecommerce)
`SELECT hits.item.productName AS other_purchased_products, COUNT(hits.item.productName) AS quantity FROM [‘Dataset Name’ ] WHERE fullVisitorId IN ( SELECT fullVisitorId FROM [‘Dataset Name’ ] WHERE hits.item.productName CONTAINS 'Product Item Name A' AND totals.transactions>=1 GROUP BY fullVisitorId ) AND hits.item.productName IS NOT NULL AND hits.item.productName != 'Product Item Name A' GROUP BY other_purchased_products ORDER BY quantity DESC;`

SELECT hits.item.productName AS other_purchased_products, COUNT(hits.item.productName) AS quantity

      FROM [‘Dataset Name’ ]

      WHERE fullVisitorId IN (

        SELECT fullVisitorId

        FROM [‘Dataset Name’ ]

        WHERE hits.item.productName CONTAINS 'Product Item Name A'

         AND totals.transactions>=1

        GROUP BY fullVisitorId  )

       AND hits.item.productName IS NOT NULL

       AND hits.item.productName != 'Product Item Name A'

      GROUP BY other_purchased_products

      ORDER BY quantity DESC;

In the first line, you select all of the other items purchased by a user, and the aggregate function COUNT() is used to calculate the quantity of each other item purchased. The result is then displayed in a field labelled quantity, with their associated item in the product field labelled as other_purchased_products.
In the gray subquery, you select only the unique users (fullVisitorId) who have carried out transactions (totals.transactions>=1) and during a transaction purchased product A (WHERE hits.item.productName CONTAINS 'Product Item Name A').

The rules (WHERE and AND statements) in the top-level (green) query disregard values in hits.item.productName which are null and contain product A.

Here is an example of the query If a customer purchases Brighton Metallic Pens - Set of 4, what other product(s) did they purchase?

products purchased by a customer who purchases 'Brighton Metallic Pens (Set of 4)' on 24th June 2013

products purchased by a customer who purchases 'Brighton Metallic Pens (Set of 4)' on 24th June 2013
SELECT hits.item.productName AS other_purchased_products, COUNT(hits.item.productName) AS quantity FROM [GoogleStore.ga_sessions_20130624] WHERE fullVisitorId IN ( SELECT fullVisitorId FROM [GoogleStore.ga_sessions_20130624] WHERE hits.item.productName CONTAINS 'Brighton Metallic Pens - Set of 4' AND totals.transactions>=1 GROUP BY fullVisitorId ) AND hits.item.productName IS NOT NULL AND hits.item.productName !='Brighton Metallic Pens - Set of 4' GROUP BY other_purchased_products ORDER BY quantity DESC;

SELECT hits.item.productName AS other_purchased_products, COUNT(hits.item.productName) AS quantity

      FROM [GoogleStore.ga_sessions_20130624]

      WHERE fullVisitorId IN (

        SELECT fullVisitorId

        FROM [GoogleStore.ga_sessions_20130624]

        WHERE hits.item.productName CONTAINS 'Brighton Metallic Pens - Set of 4'

         AND totals.transactions>=1

        GROUP BY fullVisitorId )

       AND hits.item.productName IS NOT NULL

       AND hits.item.productName !='Brighton Metallic Pens - Set of 4'

      GROUP BY other_purchased_products

      ORDER BY quantity DESC;

In Dremel/BigQuery, using WHERE expr IN triggers a JOIN, and size restrictions apply; specifically, the size of the right side of the JOIN (in this case the number of visitors) needs to be less than 8 MB. In Dremel, this is called a broadcast JOIN. When the size exceeds 8 MB, you need to trigger a shuffled JOIN, which can be done using the JOIN EACH syntax. Unfortunately, it can't be done using IN, but the same query can be rewritten with a JOIN.

Products purchased by customers who purchased product A (Enhanced Ecommerce)

This is the similar to the previous skeleton query but works for Enhanced Ecommerce. It also makes use of TABLE_DATE_RANGE to query data over multiple days.

products purchased by a customer who purchases product A (Enhanced Ecommerce)

products purchased by a customer who purchases product A (Enhanced Ecommerce)
SELECT hits.product.productSKU AS other_purchased_products, COUNT(hits.product.productSKU) AS quantity FROM SELECT fullVisitorId, hits.product.productSKU, hits.eCommerceAction.action_type FROM TABLE_DATE_RANGE([73156703.ga_sessions_], TIMESTAMP('2015-07-01'), TIMESTAMP('2015-07-14'))) WHERE fullVisitorId IN ( SELECT fullVisitorId FROM TABLE_DATE_RANGE([73156703.ga_sessions_], TIMESTAMP('2015-07-01'), TIMESTAMP('2015-07-14')) WHERE hits.product.productSKU CONTAINS '10 15103' AND hits.eCommerceAction.action_type = '6' GROUP BY fullVisitorId ) AND hits.product.productSKU IS NOT NULL AND hits.product.productSKU !='10 15103' AND hits.eCommerceAction.action_type = '6' GROUP BY other_purchased_products ORDER BY quantity DESC;

SELECT hits.product.productSKU AS other_purchased_products, COUNT(hits.product.productSKU) AS quantity

      FROM SELECT fullVisitorId, hits.product.productSKU, hits.eCommerceAction.action_type FROM TABLE_DATE_RANGE([73156703.ga_sessions_], TIMESTAMP('2015-07-01'), TIMESTAMP('2015-07-14')))

      WHERE fullVisitorId IN (

        SELECT fullVisitorId

      FROM TABLE_DATE_RANGE([73156703.ga_sessions_], TIMESTAMP('2015-07-01'), TIMESTAMP('2015-07-14'))

      WHERE hits.product.productSKU CONTAINS '10 15103'

      AND hits.eCommerceAction.action_type = '6'

      GROUP BY fullVisitorId  )

      AND hits.product.productSKU IS NOT NULL

      AND hits.product.productSKU !='10 15103'

      AND hits.eCommerceAction.action_type = '6'

      GROUP BY other_purchased_products

      ORDER BY quantity DESC;

Average number of user interactions before purchase

This is an example of a query of a JOIN() [...] ON command, which is only dependent on Analytics data.

Below is a skeleton script for the question: What is the average number of user interactions before a purchase?

number of user interactions before a purchase

number of user interactions before a purchase
SELECT one.hits.item.productSku AS ProductSku, ( sum_of_hit_number / total_hits ) AS avg_hit_number FROM ( SELECT hits.item.productSku, SUM(hits.hitNumber) AS sum_of_hit_number FROM [‘GA Dataset Name’ ] WHERE hits.item.productSku IS NOT NULL AND totals.transactions>=1 GROUP BY hits.item.productSku ) AS ‘Alias_Name_1’ JOIN ( SELECT hits.item.productSku, COUNT( fullVisitorId ) AS total_hits FROM [‘GA Dataset Name’ ] WHERE hits.item.productSku IS NOT NULL AND totals.transactions>=1 GROUP BY hits.item.productSku ) AS ‘Alias_Name_2’ ON Alias_Name_1.hits.item.productSku = Alias_Name_2.hits.item.productSku;

SELECT one.hits.item.productSku AS ProductSku, ( sum_of_hit_number / total_hits ) AS avg_hit_number

      FROM (

       SELECT hits.item.productSku, SUM(hits.hitNumber) AS sum_of_hit_number

       FROM [‘GA Dataset Name’ ]

       WHERE hits.item.productSku IS NOT NULL

        AND totals.transactions>=1

       GROUP BY hits.item.productSku  ) AS ‘Alias_Name_1’

      JOIN (

       SELECT hits.item.productSku, COUNT( fullVisitorId ) AS total_hits

       FROM [‘GA Dataset Name’ ]

       WHERE hits.item.productSku IS NOT NULL

        AND totals.transactions>=1

       GROUP BY hits.item.productSku ) AS ‘Alias_Name_2’

      ON Alias_Name_1.hits.item.productSku = Alias_Name_2.hits.item.productSku;

The first line carries out the main mathematical operation to find the average user interactions per product, and this query creates a join between two subqueries called ‘Alias_Name_1’ and ‘Alias_Name_2’.
‘Alias_Name_1’ is used to produce a field that uses the aggregate function SUM() to sum all the hit numbers recorded for a product.
‘Alias_Name_2’ is used to find the number of hits made by users per product, using the COUNT() function.
The last line shows the common field (hits.item.productSku) shared between the two datasets on the join.

Here is an example of the query On 10th September 2013, what is the average number of user interactions before a purchase?

number of user interactions on 10th September 2013 before a purchase

number of user interactions on 10th September 2013 before a purchase
SELECT one.hits.item.productSku AS ProductSku, ( sum_of_hit_number / total_hits ) AS avg_hit_number FROM ( SELECT hits.item.productSku, SUM(hits.hitNumber) AS sum_of_hit_number FROM [GoogleStore.ga_sessions_20130728] WHERE hits.item.productSku IS NOT NULL AND totals.transactions>=1 GROUP BY hits.item.productSku ) AS one JOIN ( SELECT hits.item.productSku, COUNT( fullVisitorId ) AS total_hits FROM [GoogleStore.ga_sessions_20130728] WHERE hits.item.productSku IS NOT NULL AND totals.transactions>=1 GROUP BY hits.item.productSku ) AS two ON one.hits.item.productSku = two.hits.item.productSku;

SELECT one.hits.item.productSku AS ProductSku, ( sum_of_hit_number / total_hits ) AS avg_hit_number

      FROM (

       SELECT hits.item.productSku, SUM(hits.hitNumber) AS sum_of_hit_number

       FROM [GoogleStore.ga_sessions_20130728]

       WHERE hits.item.productSku IS NOT NULL

        AND totals.transactions>=1

       GROUP BY hits.item.productSku  ) AS one

      JOIN (

       SELECT hits.item.productSku, COUNT( fullVisitorId ) AS total_hits

       FROM [GoogleStore.ga_sessions_20130728]

       WHERE hits.item.productSku IS NOT NULL

        AND totals.transactions>=1

       GROUP BY hits.item.productSku  ) AS two

      ON one.hits.item.productSku = two.hits.item.productSku;

Percentage of stock sold per product

This is an example of a query which is not only dependent on Analytics data, but also on non-Analytics data. By combining both data sets, you can begin to understand user behavior at a more segmented level. You can import non-Analytics data into BigQuery, but keep in mind that this will contribute to your monthly data storage charge.

Below is a skeleton script for the question: What percentage of stock was sold per product?

percentage of stock sold per product

percentage of stock sold per product
SELECT AnalyticsImport.product_data_20130728.productId, ((( one.quantity_sold ) / ( AnalyticsImport.product_data_20130728.productstock + one.quantity_sold )) * 100 ) AS percentage_of_stock_sold FROM [ ‘Imported_DataSet’ ] JOIN ( SELECT hits.item.productSku, SUM( hits.item.itemQuantity ) AS quantity_sold FROM [‘GA Dataset’ ] WHERE hits.item.productSku IS NOT NULL AND totals.transactions>=1 GROUP BY hits.item.productSku ) AS ‘Alias_Name’ ON Imported_DataSet.’productId_field’ = Alias_Name.hits.item.productSku;

SELECT AnalyticsImport.product_data_20130728.productId, ((( one.quantity_sold ) / ( AnalyticsImport.product_data_20130728.productstock + one.quantity_sold )) * 100 ) AS percentage_of_stock_sold

      FROM [ ‘Imported_DataSet’ ]

      JOIN (

        SELECT hits.item.productSku, SUM( hits.item.itemQuantity ) AS quantity_sold

        FROM [‘GA Dataset’ ]

        WHERE hits.item.productSku IS NOT NULL

         AND totals.transactions>=1

        GROUP BY hits.item.productSku ) AS ‘Alias_Name’

      ON Imported_DataSet.’productId_field’ = Alias_Name.hits.item.productSku;

The first line outputs two fields: one field containing all the product IDs, and one that is a mathematical operation that shows the percentage of stock sold for that product ID.
Since this query depends on two data sets, you need to use the JOIN() ... ON function. This command joins the rows from the two data sets based on the common field between them. In this case, the two data sets are [ ‘Imported_DataSet’ ] and ‘Alias_Name’.
[ ‘Imported_DataSet’ ] is the non-Analytics data. This is the data set that contains the metric field for how much stock is left (Imported DataSet.’stock_left_field’) and the product-ID dimension field (Imported_DataSet.’productId_field’).
‘Alias_Name’ is the name assigned to the data returned by the gray subquery. This subquery is using Analytics data to find out the total quantity of items sold per product.
The last line uses the ON statement to show the common field between the two data sets and where the two datasets are joined.

Many of the variables in this query have their dataset name attached to them as prefixes (e.g., Imported_DataSet.’productId_field’, Alias_Name.quantity_sold). This is to clarify what field you are selecting, and to make it explicit to which dataset it belongs.

Here is an example of the query What percentage of stock was sold per product on the 28th July 2013?

percentage of stock sold per product on 28th July 2013

percentage of stock sold per product on 28th July 2013
SELECT AnalyticsImport.product_data_20130728.productId, ( ( ( one.quantity_sold ) / ( AnalyticsImport.product_data_20130728.productstock + one.quantity_sold ) ) * 100 ) AS percentage_of_stock_sold FROM AnalyticsImport.product_data_20130728 JOIN ( SELECT hits.item.productSku, SUM( hits.item.itemQuantity ) AS quantity_sold FROM GoogleStore.ga_sessions_20130728 WHERE hits.item.productSku IS NOT NULL AND totals.transactions>=1 GROUP BY hits.item.productSku ) AS one ON AnalyticsImport.product_data_20130728.productId = one.hits.item.productSku ORDER BY percentage_of_stock_sold DESC;

SELECT AnalyticsImport.product_data_20130728.productId, ( ( ( one.quantity_sold ) / ( AnalyticsImport.product_data_20130728.productstock + one.quantity_sold ) ) * 100 ) AS percentage_of_stock_sold

      FROM AnalyticsImport.product_data_20130728

      JOIN (

        SELECT hits.item.productSku, SUM( hits.item.itemQuantity ) AS quantity_sold

        FROM GoogleStore.ga_sessions_20130728

        WHERE hits.item.productSku IS NOT NULL

         AND totals.transactions>=1

        GROUP BY hits.item.productSku ) AS one

      ON AnalyticsImport.product_data_20130728.productId = one.hits.item.productSku

      ORDER BY percentage_of_stock_sold DESC;

Profitability of each product

Below is a skeleton script for the question: What is the profitability of each product?

profit by product

profit by product
SELECT Alias_Name.hits.item.productSku, ( Imported_DataSet.’product profit field’ * Alias_Name.quantity ) AS profit FROM ( SELECT Alias_Name.hits.item.productSku, Imported_DataSet.’product profit field’ FROM [ ‘Imported Data Set’ ] JOIN ( SELECT hits.item.productSku, SUM( hits.item.itemQuantity ) AS quantity FROM [ ‘GA Dataset Name’ ] WHERE hits.item.productSku IS NOT NULL GROUP BY hits.item.productSku ) AS ‘Alias_Name’ ON Imported_DataSet.productId = Alias_Name.hits.item.productSku );

SELECT Alias_Name.hits.item.productSku, ( Imported_DataSet.’product profit field’ * Alias_Name.quantity ) AS profit

      FROM (

        SELECT Alias_Name.hits.item.productSku, Imported_DataSet.’product profit field’

        FROM [ ‘Imported Data Set’ ]

        JOIN (

          SELECT hits.item.productSku, SUM( hits.item.itemQuantity ) AS quantity

          FROM [ ‘GA Dataset Name’ ]

          WHERE hits.item.productSku IS NOT NULL

          GROUP BY hits.item.productSku ) AS ‘Alias_Name’

        ON Imported_DataSet.productId = Alias_Name.hits.item.productSku );

The first line contains the mathematical operation to calculate the total profit made on each product.
The gray subquery uses non-Analytics data, which collects the data about how much profit is made when a product is sold.
The red subquery is the Analytics data subquery, which will be joined with the non-Analytics data. It calculates the quantity of items sold per product.
The last line uses the ON statement to clarify the field that the two data sets share. In this case, it is the product id number.

Here is an example of the query What was the profitability of each product on 28th July 2013?

profit by product on 28th July 2013

profit by product on 28th July 2013
SELECT two.hits.item.productSku, ((AnalyticsImport.product_data_20130728.productprice-AnalyticsImport.product_data_20130728.productcost)*two.quantity) AS profit FROM ( SELECT two.hits.item.productSku, AnalyticsImport.product_data_20130728.productcost, two.quantity, AnalyticsImport.product_data_20130728.productprice FROM AnalyticsImport.product_data_20130728 JOIN ( SELECT hits.item.productSku, SUM( hits.item.itemQuantity ) AS quantity FROM GoogleStore.ga_sessions_20130728 WHERE hits.item.productSku IS NOT NULL GROUP BY hits.item.productSku ) AS two ON AnalyticsImport.product_data_20130728.productId = two.hits.item.productSku );

SELECT two.hits.item.productSku, ((AnalyticsImport.product_data_20130728.productprice-AnalyticsImport.product_data_20130728.productcost)*two.quantity) AS profit

      FROM (

        SELECT two.hits.item.productSku, AnalyticsImport.product_data_20130728.productcost, two.quantity, AnalyticsImport.product_data_20130728.productprice

        FROM AnalyticsImport.product_data_20130728

        JOIN (

          SELECT hits.item.productSku, SUM( hits.item.itemQuantity ) AS quantity

          FROM GoogleStore.ga_sessions_20130728

          WHERE hits.item.productSku IS NOT NULL

          GROUP BY hits.item.productSku ) AS two

        ON AnalyticsImport.product_data_20130728.productId = two.hits.item.productSku );

Profit is calculated by finding the difference between the price the product is sold at and the cost of manufacturing it. This information is stored on the non-GA dataset.

Real profitability of each product (taking into account refunds)

Below is a skeleton script for the question: What is the real profitability of each product?

real profit by product

real profit by product
SELECT Alias_Name.hits.item.productSku, (gross_profit-total_refund_revenue) AS real_profit FROM ( SELECT Alias_Name.hits.item.productSku, ( ( Imported_DataSet.productprice - Imported_DataSet.productcost ) * Alias_Name.quantity ) AS gross_profit, ( ( Imported_DataSet.refunddeliveryprice + Imported_DataSet.productprice ) * Imported_DataSet.refundquantity ) AS total_refund_revenue FROM ( SELECT Alias_Name.hits.item.productSku, Imported_DataSet.productcost, Alias_Name.quantity, Imported_DataSet.productprice, Imported_DataSet.refunddeliveryprice, Imported_DataSet.refundquantity FROM [ ‘Imported DataSet Name’ ] AS 'Imported_DataSet' JOIN ( SELECT hits.item.productSku, SUM( hits.item.itemQuantity ) AS quantity FROM [‘GA Dataset Name’ ] WHERE hits.item.productSku IS NOT NULL AND totals.transactions >=1 GROUP BY hits.item.productSku ) AS 'Alias_Name' ON Imported_DataSet.productId = Alias_Name.hits.item.productSku ) );

SELECT Alias_Name.hits.item.productSku, (gross_profit-total_refund_revenue) AS real_profit

      FROM (

        SELECT Alias_Name.hits.item.productSku, ( ( Imported_DataSet.productprice - Imported_DataSet.productcost ) * Alias_Name.quantity ) AS gross_profit, ( ( Imported_DataSet.refunddeliveryprice + Imported_DataSet.productprice ) * Imported_DataSet.refundquantity ) AS total_refund_revenue

        FROM (

          SELECT Alias_Name.hits.item.productSku, Imported_DataSet.productcost, Alias_Name.quantity, Imported_DataSet.productprice, Imported_DataSet.refunddeliveryprice, Imported_DataSet.refundquantity

          FROM [ ‘Imported DataSet Name’ ] AS 'Imported_DataSet'

          JOIN (

            SELECT hits.item.productSku, SUM( hits.item.itemQuantity ) AS quantity

            FROM [‘GA Dataset Name’ ]

            WHERE hits.item.productSku IS NOT NULL

             AND totals.transactions >=1

            GROUP BY hits.item.productSku ) AS 'Alias_Name'

          ON Imported_DataSet.productId = Alias_Name.hits.item.productSku )  );

This is a very similar query to What is the profitability of each product? The only differences are in the non-Analytics dataset in the gray subquery and in the mathematical operation calculating the real profit in the first line.
In the non-Analytics dataset, you are also calculating the total amount of money spent on refunds (in the SELECT statement of the red subquery).
Then you carry out a mathematical operation in line 1 to find the real profit by subtracting the revenue spent on refunds from your gross profit.

For more information on the query, please see the section on the profitability of each product.

Here is an example of the following query What was the real profitability of each product on the 28th July 2013?

real profit by product on 28th July 2013
SELECT two.hits.item.productSku, (gross_profit-total_refund_revenue) AS real_profit FROM ( SELECT two.hits.item.productSku, ( ( AnalyticsImport.product_data_20130728.productprice - AnalyticsImport.product_data_20130728.productcost ) * two.quantity ) AS gross_profit, ( ( AnalyticsImport.product_data_20130728.refunddeliveryprice + AnalyticsImport.product_data_20130728.productprice ) * AnalyticsImport.product_data_20130728.refundquantity ) AS total_refund_revenue FROM ( SELECT two.hits.item.productSku, Analytics.product_data_20130728.productcost, two.quantity, AnalyticsImport.product_data_20130728.productprice, AnalyticsImport.product_data_20130728.refunddeliveryprice, AnalyticsImport.product_data_20130728.refundquantity FROM AnalyticsImport.product_data_20130728 JOIN ( SELECT hits.item.productSku, SUM( hits.item.itemQuantity ) AS quantity FROM GoogleStore.ga_sessions_20130728 WHERE hits.item.productSku IS NOT NULL AND totals.transactions >=1 GROUP BY hits.item.productSku ) AS two ON AnalyticsImport.product_data_20130728.productId = two.hits.item.productSku ) );

real profit by product on 28th July 2013

SELECT two.hits.item.productSku, (gross_profit-total_refund_revenue) AS real_profit

      FROM (

        SELECT two.hits.item.productSku, ( ( AnalyticsImport.product_data_20130728.productprice - AnalyticsImport.product_data_20130728.productcost ) * two.quantity ) AS gross_profit, ( ( AnalyticsImport.product_data_20130728.refunddeliveryprice + AnalyticsImport.product_data_20130728.productprice ) * AnalyticsImport.product_data_20130728.refundquantity ) AS total_refund_revenue

        FROM (

          SELECT two.hits.item.productSku, Analytics.product_data_20130728.productcost, two.quantity, AnalyticsImport.product_data_20130728.productprice, AnalyticsImport.product_data_20130728.refunddeliveryprice, AnalyticsImport.product_data_20130728.refundquantity

          FROM AnalyticsImport.product_data_20130728

          JOIN (

            SELECT hits.item.productSku, SUM( hits.item.itemQuantity ) AS quantity

            FROM GoogleStore.ga_sessions_20130728

            WHERE hits.item.productSku IS NOT NULL

             AND totals.transactions >=1

            GROUP BY hits.item.productSku ) AS two

          ON AnalyticsImport.product_data_20130728.productId = two.hits.item.productSku ) );

The real profit takes into account the profitability of a product after taking refunded products into consideration. To calculate the total refund revenue for a product:

total refund revenue for a product = ( the price of the product + the refund delivery price for the product ) * the quantity of products refunded

Was this helpful?

How can we improve it?

[UA] BigQuery cookbook [Legacy]

Query optimization

Select only what you need

Permit caching

Use intermediate tables for commonly used subqueries

Query debugging

Tips and best practices

Using the sample dataset

Using Standard SQL vs. Legacy SQL

Query multiple tables

3 days

Standard SQL

Legacy SQL

Last 1095 Days

Standard SQL

Legacy SQL

Last 36 Months

Standard SQL

Legacy SQL

Last 3 years

Standard SQL

Legacy SQL

Specific date range

Standard SQL

Legacy SQL

Last 3 years plus today’s data (intraday)

Standard SQL

Basic query examples

Total [metric] per [dimension]?

Standard SQL

Legacy SQL

Average bounce rate per [ dimension ]?

Standard SQL

Legacy SQL

Average number of product pageviews by purchaser type (purchasers vs non-purchasers)

Standard SQL

Legacy SQL

Standard SQL

Legacy SQL

Average number of transactions per purchaser

Standard SQL

Legacy SQL

Average amount of money spent per session

Standard SQL

Legacy SQL

Sequence of hits

Standard SQL

Legacy SQL

Multiple custom dimensions at the hit or session level

Advanced query examples

Products purchased by customers who purchased product A (Classic Ecommerce)

Products purchased by customers who purchased product A (Enhanced Ecommerce)

Average number of user interactions before purchase

Percentage of stock sold per product

Profitability of each product

Real profitability of each product (taking into account refunds)

Was this helpful?

Need more help?

Try these next steps: