Using Tableau and Google Analytics to analyse the drivers of growth in online retail

February 8th, 2012 by Yali

The Google-export-to-csv blog post series

At Keplar, we often find ourselves using web analytics data as one source of data to help us understand how our clients business (particularly in online retail) have performed in the last 3-5 years, and what has driven changes in that performance. We tend to perform that analysis by extracting the data out of the web analytics software (normally Google Analytics) so that we can easily visualise it in a way that makes it easy to spot the drivers of growth and hone in on the causal factors responsible for any changes in performance. In this blog post, we run through the steps to perform this analysis quickly, because they are steps that any online retailer (or in fact web business in general) would want to perform.

Three very common questions for an online retailer to ask are:

  1. How have sales and traffic grown in my online store over time?
  2. What has driven that growth?
  3. What can I do to increase growth in a cost-effective way?

Web analytics data can be helpful in answering the first two questions – by identifying:

  1. Where the people who visited and bought from a website came from
  2. How different sources of traffic have changed their contribution over the time period in question

These can then form a basis for answering the third question.

A plot like this clearly shows the relative importance of different channels in driving traffic growth

Unfortunately, Google Analytics does not provide an easy way to visualise how relative contributions of different traffic sources have changed over time via its web interface. Luckily for us, Google does however make it easy to grab the relevant data from the Google Analytics API and ultimately generate the above visualisation. In this blog post, I will show how to do perform this analysis, using Google-Analytics-Export-to-CSV to extract the data, and Tableau to quickly graph and drill into the results.

I also hope to demonstrate what we call train-of-thought-analysis – where a fast business intelligence tool such as Tableau is used to answer questions which suggest new questions, which can in turn be answered by follow-on analysis, in particular by drilling in on subsets of the data.

The steps presented below were performed for Psychic Bazaar, a startup, specialist retailer in the mind, body and spirit sector. (Many thanks to the folks at Psychic Bazaar for being willing to share their data.) However, the same steps can be applied to any online shop (or indeed, any website) with Google Analytics integrated. And if you don’t have Tableau, you can download a 30 day trial version of the software, and treat this blog post as an introduction to Tableau. Alternatively, comparable analyses should be easy to perform using alternative BI tools (e.g. Microstrategy or Qlikview).

1. Extracting the relevant data out of the Google API

First off, we need to determine what data we want to extract from the Google Analytics API. When writing queries, I use Google’s own Data Feed Query Explorer to test what useful combination of metrics and dimensions can be extracted together.

Starting off with metrics, we want to look at:

  1. Traffic
    • Visits
    • Page views
  2. Sales
    • Transactions
    • Items sold
    • Revenue
  3. Other events
    • Total events (if we’re performing an analysis on a site where an important event is captured using Google’s event tracking)
    • Event “value” if relevant

Note that I did not include visitors in the above list. Visitors is not an additive metric: if there were 1,000 visitors today and 1,000 visitors yesterday, we don’t know how many in total we had over the two days, because we don’t know how many of today’s visitors also visited us yesterday. We may want to experiment with reporting our results over different time periods, and we can’t do this if we include non-additive metrics.

I select the metrics from the dropdown in the Data Feed Query Explorer as below:

We now need to decide what dimensions we want to analyse the results against. We need to include:

  1. Date: because we want to understand how each metric has grown over time
  2. Traffic source/medium: where did each visit come from? (Search? A referring site? Direct?)
    1. Referral path: if the user clicked through to our site from a third-party website, which was it?
    2. Keyword: if the user clicked through from a search engine, which keywords did they type in?
  3. Visitor type: is the user a new or returning visitor?

To run our query, we’re going to use Google-Analytics-Export-to-CSV. (Download and setup instructions can be found here.) We first need to create an XML file containing our query parameters. So, using the metrics and dimensions listed above, I create the following short XML:

<?xml version="1.0" encoding="UTF-8"?>
<orderly xmlns:xlink="http://www.w3.org/1999/xlink">
<configuration>
    <query>
        <tableId>ga:39354421</tableId>
        <dimensions> ga:date,ga:source,ga:medium,ga:referralPath,ga:keyword,ga:visitorType,ga:country</dimensions>
        <metrics>ga:visits,ga:totalEvents,ga:transactions,ga:transactionRevenue,ga:itemQuantity,ga:pageviews</metrics>
        <segments></segments>
        <filters></filters>
        <sort></sort>
        <start-date>2011-11-01</start-date>
        <end-date>2012-02-06</end-date>
        <start-index></start-index>
        <max-results></max-results>
    </query>
    <login>
        <userName>{USERNAME}</userName>
        <password>{PASSWORD}</password>
    </login>
    <app>
        <appName>ga-quick-data-grabber</appName>
        <url>https://www.google.com/analytics/feeds/data</url>
    </app>
</configuration>
</orderly>

The dimensions, metrics and tableId are all as taken from the Google Data Feed Explorer.

The start date is set to 1 November 2011, because the site that we’re performing the analysis for was only launched in November last year. You can take as wide a date range as you like – we recommend taking as wide a date range as you have available in Google Analytics (i.e. 5 years if you have 5 years of data). The nice thing about this approach using Tableau and Google-Analytics-Export-to-CSV is that it scales to large volumes of data without any difficulty, because we’re not relying on either the Google Analytics UI or Microsoft Excel.

The other fields are left blank, except for the username and password fields, where you should enter your credentials for Google Analytics.

We save this XML query file alongside our ga2csv.jar file (which holds the Google-Analytics-Export-to-CSV code) and then perform the extraction by running at the command line:

java -jar ga2csv-0.1.jar growth-analysis.xml growth-analysis.csv

Where growth-analysis.xml is our XML query file (as above), and growth-analysis.csv is the output CSV file that will be created by executing the above command:

Run the query and then check that the new CSV file has been generated; we will be visualising this data in Tableau in the next step.

2. Visualising the data in Tableau: are traffic rates growing and if so, what is driving that growth?

In Tableau the first step is to connect to the data that we’ve just generated. Click on Connect to date and then select Text file, as below:

Locate the output CSV file and click Open. The default text file connection values will all work perfectly for us.

When prompted choose Import all data into Tableau’s Fast Data Engine. This will make slicing and dicing on the data blindingly fast, especially when compared to Google’s web UI or any attempts in Excel. Select a suitable filename to save the data into Tableau’s own data format. You should then be presented with Tableau’s standard interface:

Notice above how Tableau has correctly distinguished the different Measures (listed on the bottom left of the screen) with the Dimensions (listed on the top left) except for ga:date. This is in fact a dimension, so drag it from the Measures box to the Dimensions box.

For starters, let’s look at how our measures change over time. Drag our visits and page views metrics into the Text or Measure Values panes. Then drag the ga:date dimension to the rows tab as below.

We can see the number of visits and page views starting from the day that this website was soft-launched. Note that the ga:date field is not very nicely formatted: Tableau does not know that this is a date field we are dealing with. To tidy things up, let’s create a new dimension called Date, which will be a properly formatted date dimension calculated on the basis of ga:date. Right click on the Dimensions pane and select Create Calculated Field.

In the Name field enter “Date” and in the “Formula” field enter:

DATE(right(str([ga:date]),2)+”/”+mid(str([ga:date]),5,2)+”/”+left(str([ga:date]),4))

The above formula constructs a date dimension in a format Tableau understands out of the ga:date field that we’ve fetched from Google.

Click OK and replace the ga:date field in the Rows tab with the new Date field:

Note how Tableau has now automatically grouped our results by year. We can drill down using the [+] button next to YEAR(Date). In our case, we’re interested in plotting results by day, but you can pick whatever suits you best. The nice thing about Tableau is you can switch between days, weeks, months, quarters and years at will:

Enough with the numbers! Let’s start visualising this data on a graph. On the Show me toolbar, click on the line graph icon:

We can see see that both visit numbers and pageviews have grown daily since launch. A quick glance reveals that the two curves rise in-step, but that on November 30th and December 9th there were spikes in the number of page views relative to the number of visitors. This is something which we should drill into later – perhaps a handful couple of “whales” (i.e. highly engaged customers) visited the site, doing some Christmas shopping? For now we want to stay focused on our questions:

  1. How has our website grown? (In terms of traffic, sales etc)
  2. What has driven that growth?

As it is, we can only see part of our time frame in the window. Let’s compress the view by switching from a view by-day to a view by-week. Select the DAY(date), click the dropdown and select WEEK:

That’s much better: we can now see the entire time period of interest. A couple of things immediately jump out at us:

  1. Whilst the absolute numbers of visits to this new website are still low (only 500-600 visits per week), growth has been strong and consistent, from about 100 per week at launch. So this is five-fold growth in 3 months.
  2. It appears that growth has suddenly plummeted in the most recent week – but this is misleading. The final week only starts on February 5th, so we only have 2 days of data here

So our next question is, what has driven that growth? For the moment let’s focus on visits, by dragging ga:pageviews out of the rows column. Let’s drill into where the visits have come from – be it organic search, CPC, referrer or direct traffic. Drag ga:medium (which captures this split) from the Dimensions pane to the Color pane:

We can now see the number of visits from each of our four traffic sources, graphed on the same plot. It looks like growth has primarily been driven by CPC advertising, although traffic from organic search has also grown consistently over time. The graph is not especially easy to read – click the Area Chart icon in the Show me tool bar for a clearer view (or alternatively, change Mark from Line to Area).

Also, let’s get rid of the last week of data, because it is incomplete, by dragging Date from the Dimensions pane to the Filter pane, selecting Range of Dates and selecting a range that only extends to February 4th:

This is a really important graph. It clearly shows that:

  1. CPC advertising has grown from nothing to driving c.300 visits by week. However, the number of visits from this channel did dip between Christmas and today.
  2. Search engine traffic has grown from about 49 visits per week to 150 (i.e. a threefold increase). Nearly all of that growth was in the second half of the period
  3. The amount of traffic driven from referrers has actually declined

All three of the above observations immediately beg another set of questions:

  1. What drove the growth in visitors via CPC?
  2. What about the growth from organic search?
  3. And why do we see declines in the volume of traffic from 3rd party sites?

In the next section we’ll start to answer the first of those questions.

3. Drilling into CPC traffic: what accounts for the growth?

Let’s drill into just the CPC traffic. Drag ga:medium from the Color pane to the Filters pane and uncheck the other sources of traffic so we’re only looking at CPC traffic:

Now that we’re only looking at CPC traffic, let’s break down the results by keyword. Drag ga:keyword from the Dimensions pane to the Color pane:

This is a bit messy: we can now see our “long tail” of different keywords. Let’s organise them by the numbers of visits that each of them contributed. Right click on the Color pane and select sort. Now sort by Field – where field is ga:visits which are aggregated using sum – these are the default values. Finally click OK:

That is now much clearer! We can immediately that see:

  1. One set of keywords, “buy tarot cards” is responsible for approximately one-third of all visits by keyword. The remainder of visits are driven by the long-tail of keywords
  2. Visits from “buy tarot cards” keywords peaked in mid-December
  3. The length of that long tail has grown over time. Prior to November 20th, these were the only keywords generating any visits. The following week, five groups of keywords generated visits. The following week, the number of keyword combinations again grew. In fact, this retailer has gradually over time increased the number of campaigns and hence keywords. We can see that whilst many keywords individually generate small levels of traffic, collectively they account for two-thirds of the total traffic volumes, and that this growth in keywords accounts for all growth in this traffic after mid-December

Understanding why e.g. the number of visits from “buy tarot cards” declined after mid-December would require taking a look at our AdWords data, to see if any changes were made to the relevant campaigns. For the moment, we have learnt two critical things about the CPC campaigns: that one set of keywords is much more important traffic-wise than the others, but that the growing long tail of other keywords is driving sustained growth in traffic over time.

4. What is the “quality” of our CPC traffic?

Because all CPC traffic has to be paid for, another question we might consider while we look at this data is the quality of the traffic. Let’s compare:

  1. The number of visits by keyword, with
  2. The number of page views by keyword, and
  3. The number of transactions by keyword.

The transactions-per-visits ratio gives us an idea of what percentage of visitors by keyword become buyers: it is a critical conversion ratio. However, because there is a long tail of keywords where the number of visits is very small (<100 for our time period), we need a more sensitive measure of quality of traffic than the transactions:visits ratio – which is only statistically significant if we have 10,000s of visits per keyword combination. Hence, we look at the page views per visit: if this is high, it suggests that users who searched for those keywords tended to engage deeply with the site and spend time exploring it, rather than simply “bounce” away.

To plot this graph, we are no longer taking a temporal view, so drag Date off the Columns tab and replace it with ga:pageviews. We want to plot a bubble graph, so in the Marks pane replace Area with Shape. Thus this graph shows the number of page views plotted against the number of visits for each of our keywords:

All our bubbles are concentrated in the bottom left of the screen, owing to the dominant keyword combination “buy tarot cards” in the top right of the screen.

I’ll exclude that particular result by right clicking on it. Next, to visualise the number of transactions, drag ga:transactions to the Size pane within the Marks pane, so that the size of each bubble reflects the number of transactions completed against each keyword combination.

The resulting graph is really interesting. We’ll explore this type of analysis in more detail in a dedicated blog post later. For the moment, it is worth highlighting some interesting discoveries:

  1. The number of transactions per keyword combination is 0 in many cases. However, this is not surprising, given that the number of visits per keyword combination is less than 20, in most cases. (This analysis is better done with a bigger site and/or over a longer time period.) If a typical conversion rate is 2%, we would need at least 50 visits per keyword combination before we can expect a single transaction
  2. One of the best trafficked campaigns (near the top right of the screen) has generated no transactions! The client should keep an eye on this one: if it continues to perform poorly, it should be pulled. Alternatively, it may be that the landing page, or products advertised, are poorly represented, and that some structural change is necessary to get it performing better
  3. The number of page views per visit is roughly four on average, across most campaigns. However for very low-traffic campaigns (towards the bottom left), the number of page views per visit is significantly higher. This is tantalizing: if we can find a way to grow those visitor numbers (e.g. by bidding more for them on AdWords), it is highly likely they will convert well, owing to their high level of engagement with the site
  4. One campaign appears to have significantly fewer page views per visit than the others. Something is wrong and needs to be fixed: perhaps the landing page, the products advertised, the pricing of those products or the keyword selection is inappropriate. Further investigation will be required here

5. Other analyses

We have only just got started, but I hope that the steps set out above illustrate how quickly Tableau can throw up interesting insights from the Google Analytics data, and provides the tools to drill into that data in more detail to answer new questions as they arise. I also hope to have shown the value of some visualisations that web analysts share less frequently, because they cannot be produced in Google Analytics directly.

There is a wealth of other questions which we could begin to answer using this same data set:

  1. What accounts for the growth in organic search traffic over the time period? Is it growing brand awareness (more people searching for “Psychic Bazaar” on Google) or people searching for product-related terms such as specific tarot decks?
  2. What accounts for the decline in referrer volume over time? Which third party websites have inbound links to Psychic Bazaar? Why has traffic from them declined? Are they in blog posts, Twitter links or other transient content? If so, what can be done to grow the number of inbound links?
  3. Has growth in visitors driven actual sales growth? Are visitors from some traffic sources more likely to buy than others?  Have conversion rates improved over time – and if so, why, for example has site content and navigation been improved? Or has the effort to grow traffic resulted in lower quality traffic, and hence lower conversion rates, over time?

In the same way that we used Tableau to drill into our CPC data, we could use it to drill into our organic search data or referrer data to answer the first two questions.

To answer the third, we would need to look out some of the other metrics we extracted from Google: rather than look at just visits and pageviews, also look at transactions and revenue.

Lastly, of course, there is a wealth of questions we could answer by pulling different slices of data out of Google Analytics. To give just some examples:

  1. What is our return on ad spend, over time by keyword?
  2. Does engagement in the site vary by time of the day? Are people more likely to make purchases at certain times rather than others?
  3. How do conversion rates vary by product?
  4. How do page views vary by product? Are some products buried “too deep” within the site to get attract many visitors?
  5. How do conversion rates vary by country?
  6. Where in the buying funnel are customers most likely to get “stuck”?

We’ll be exploring some of these and more questions in future blog posts.

Leave a Reply