Introducing SnowPlow: the world’s most powerful web analytics platform

February 21st, 2012 by Yali

Download the SnowPlow brochure

What is SnowPlow?

SnowPlow does three things:

  1. Identifies users, and tracks the way they engage with one or more websites
  2. Stores the associated data in a scalable “clickstream” data warehouse
  3. Makes it possible to leverage a big data toolset (e.g. Hadoop, Pig, Hive) to analyse that data

That sounds like a web analytics tool. There are many of them. (And many of them are free, including Google Analytics.) Why build a new one?

We built SnowPlow for our own web analytics purposes. We recognise that we are unusual web analysts, for a number of reasons, but for us, there are a lot of frustrations with the different solutions that are currently available:

1. We want access to atomic, customer-level and event-level data

Read the rest of this entry »

Approaches to developing your company’s analytics capability in a big data world

February 10th, 2012 by Yali

Whilst there is a lot to love about big data, big data gives CIOs and business folks reason to moan: using it means developing expertise in new approaches to analytics, new technologies and new business processes.  However, for companies that have not, to date, successfully implemented a data-warehouse, big data offers one huge reason to smile: it makes it possible for companies to develop their data warehousing platform in an incremental, step-by-step fashion, with much lower initial costs than the traditional, “big bang” approach that represented the orthodox approach in the pre-big data world.

In this blog post, we explain why that is the case, and outline what we believe is the best approach for companies looking to build out an internal analytics capability that takes advantage of big data technologies like Hadoop. Read the rest of this entry »

Using Tableau and Google Analytics to analyse the drivers of growth in online retail

February 8th, 2012 by Yali

At Keplar, we often find ourselves using web analytics data as one source of data to help us understand how our clients business (particularly in online retail) have performed in the last 3-5 years, and what has driven changes in that performance. We tend to perform that analysis by extracting the data out of the web analytics software (normally Google Analytics) so that we can easily visualise it in a way that makes it easy to spot the drivers of growth and hone in on the causal factors responsible for any changes in performance. In this blog post, we run through the steps to perform this analysis quickly, because they are steps that any online retailer (or in fact web business in general) would want to perform.

Three very common questions for an online retailer to ask are:

  1. How have sales and traffic grown in my online store over time?
  2. What has driven that growth?
  3. What can I do to increase growth in a cost-effective way?

Web analytics data can be helpful in answering the first two questions – by identifying:

  1. Where the people who visited and bought from a website came from
  2. How different sources of traffic have changed their contribution over the time period in question

These can then form a basis for answering the third question.

A plot like this clearly shows the relative importance of different channels in driving traffic growth

Unfortunately, Google Analytics does not provide an easy way to visualise how relative contributions of different traffic sources have changed over time via its web interface. Luckily for us, Google does however make it easy to grab the relevant data from the Google Analytics API and ultimately generate the above visualisation. In this blog post, I will show how to do perform this analysis, using Google-Analytics-Export-to-CSV to extract the data, and Tableau to quickly graph and drill into the results.

I also hope to demonstrate what we call train-of-thought-analysis – where a fast business intelligence tool such as Tableau is used to answer questions which suggest new questions, which can in turn be answered by follow-on analysis, in particular by drilling in on subsets of the data.

The steps presented below were performed for Psychic Bazaar, a startup, specialist retailer in the mind, body and spirit sector. (Many thanks to the folks at Psychic Bazaar for being willing to share their data.) However, the same steps can be applied to any online shop (or indeed, any website) with Google Analytics integrated. And if you don’t have Tableau, you can download a 30 day trial version of the software, and treat this blog post as an introduction to Tableau. Alternatively, comparable analyses should be easy to perform using alternative BI tools (e.g. Microstrategy or Qlikview).

Read the rest of this entry »

Installing Google-Analytics-export-to-CSV

January 31st, 2012 by Yali

Google-Analytics-export-to-CSV is a straightforward, command-line tool for getting data out of Google Analytics (via the API) and into a CSV file, so you can open it in your favourite analytics program.

For an introduction, see here. For instructions on how to use the tool to run queries and extract data, see here. The program can be downloaded here. It is packaged as a ZIP file. It only needs to be unzipped before it can be used at the command line.

The program requires Java to run. If you do not have Java runtime environment installed, you will need to install it.  The following is a step-by-step guide to install Google-Analytics-export-to-CSV (incl. Java if necessary) and run your first query:

Read the rest of this entry »

Using Google-Analytics-export-to-CSV: a step-by-step guide

January 31st, 2012 by Yali

Google-Analytics-Export-to-CSV is a free (open source) command-line tool that makes it easy to pull large volumes of data out of Google Analytics and process it in your favourite analytics tool including Tableau, R or (even) Excel. For an introduction see here, to download it click here.

Extracting data from Google Analytics is a simple, 3 step process:

  1. Develop your query
  2. Run the query
  3. Import the results into an analytics tool

The three steps are described in more detail below.

Read the rest of this entry »

Introducing Google-Analytics-export-to-CSV: a fast, simple way to get your Google Analytics data into your favourite analytics programme

January 31st, 2012 by Yali

Download Google-Analytics-export-to-CSV, a free (open source), quick and simple tool for easily pulling data out of Google Aanlytics via the API.

Google Analytics contains of wealth of interesting data. Often, however, it makes sense to take the data out of GA and analyse it in a separate tool e.g. Tableau, R, Excel. There are a number of reasons why this is sometimes desirable:

  1. A number of analyses are hard / clunky to do in Google Analytics via the web UI. (Indeed a number are impossible.)
  2. Whilst it is generally impossible to join data in Google Analytics with other data sources (e.g. CRM systems), it is often desirable to compare graphs alongside others generated from different data sources. This is much easier if both sets of data are available in the same analytics tool

Because the tool uses Google’s Data Export API, it can extract much larger volumes of more detailed data than is possible using the web UI: up to 7 dimensions and 10 metrics with each pull. Further, if the query you run with it returns more than 10,000 lines of data (the limit returned by the API), the tool automatically makes extra calls to fetch the additional data and pop it in your CSV automatically, so CSV has all the data you require.

The Google-Analytics-export-to-CSV is a command-line tool we developed internally at Keplar to make it easy for us to grab data out of our own (or our client’s) Google Analytics account to enable us to perform more powerful analyses, faster. We are now making it available to everyone on the internet, for free, as an open source project.

In the next couple of days we plan to make the source code available on Github. In the meantime, if you are a data analyst hungry to get your Google Analytics data out into your favourite analytics tool, you can download it here.

———–

Update: the source code is now available on Github here: https://github.com/datascience/google-analytics-export-to-csv

A compiled version of Google-Analytics-export-to-csv is also available via Github here

Why big data matters to companies in retail and media. (A straightforward guide for business folk)

January 30th, 2012 by Yali

A downloadable version of this blog post (in PDF format) is available here

Introduction

The term “big data” is very much in vogue at the moment. In this white paper, we explore what big data means, what opportunities it presents to companies in the retail and media sectors and outline what companies need to do to take advantage of big data.

The purpose of this white paper is to provide an overview of the opportunities, challenges and success factors around big data. There is a lot to explore in each of the areas that we introduce: we plan to do this in subsequent white papers and blog posts, all of which will be made available on the Keplar website.

OK - so we should up our marketing spend on this customer segment?

OK - so we should up our marketing spend on this customer segment?

A little history

The idea that important decisions in companies should be data-driven is much older than big data. Toyota’s pioneering use of data to drive efficiency in their manufacturing process helped them to steal a march on their American competitors back in the 1970s. Fast forward to Nineties Britain, and Tesco’s pioneering use of customer data collected via their club card scheme (run by Dunnhumby) helped them to establish themselves as the largest supermarket in the UK and in the top ten retailers globally.
Read the rest of this entry »

2011 in retrospect: agile data analytics with Scala

January 4th, 2012 by Alex

Looking back, 2011 was the year in which the team here at Keplar ‘got our hands dirty’ and started writing code to answer some of our clients’ thornier business questions. In the sectors we focus on (online/offline retail, online advertising, digital products), clients often have access to very large data sets, and need help manipulating and analysing this ‘big data’ to understand their business performance, make strategic decisions and build better products and services. This new-found appetite for agile and ‘bottom-up’ analysis and decision-making contrasts strongly with the more ‘top-down’ approach (of business models, focus groups and desk research) traditionally favoured by management consultants.

One of the Keplar bookshelves (the Scala books are out on loan)

One of the Keplar bookshelves (the Scala books are out on loan)


Read the rest of this entry »

How publishers can develop and use audience data to drive ad revenue

October 3rd, 2011 by Yali

OpenX has just published the presentation we gave last week, on why and how publishers should develop audience data. The presentation is shown below:

This is a summary of the “Data=dollars” white paper we wrote for OpenX, published a few weeks ago.

There are a number of exciting opportunities for publishers interested in using their knowledge of their audience to drive improved ad revenue. We’ll be exploring some of these possibilities, and walking through a series of best practices, in a forthcoming blog series. Stay tuned!

Approaches to accuracy for Mechanical Turk

September 30th, 2011 by Yali

This is the third blog post in our series on using Amazon’s Mechanical Turk to build scalable business processes. Please see also our introductory post and our second post, getting started with Mechanical Turk.

Amazon’s Mechanical Turk provides a very convenient platform for getting large numbers of workers to perform manual steps as part of large scale business processes, such as cleaning data sets for use in machine-learning algorithms, or moderating content.

However, it is not enough for Mechanical Turk to provide results fast. The results themselves need to be reliable and hence it is critical that companies using Mechanical Turk invest in a suitable strategy for accuracy.

The mirror in the Hubble Space Telescope, the most precise ever made, was initially 10 nanometers off the correct curvature. The inaccuracy was catastrophic and cost several million dollars to fix

Amazon provides two primary tools for helping users validate the accuracy of results. We’ll look at these both briefly, before outlining a third technique which, used in combination with the first two, can be used to deliver a very rigorous approach to accuracy. These three strategies for accuracy are as follows:

Read the rest of this entry »