Installing Google-Analytics-export-to-CSV

January 31st, 2012 by Yali

Google-Analytics-export-to-CSV is a straightforward, command-line tool for getting data out of Google Analytics (via the API) and into a CSV file, so you can open it in your favourite analytics program.

For an introduction, see here. For instructions on how to use the tool to run queries and extract data, see here. The program can be downloaded here. It is packaged as a ZIP file. It only needs to be unzipped before it can be used at the command line.

The program requires Java to run. If you do not have Java runtime environment installed, you will need to install it.  The following is a step-by-step guide to install Google-Analytics-export-to-CSV (incl. Java if necessary) and run your first query:

Read the rest of this entry »

Using Google-Analytics-export-to-CSV: a step-by-step guide

January 31st, 2012 by Yali

Google-Analytics-Export-to-CSV is a free (open source) command-line tool that makes it easy to pull large volumes of data out of Google Analytics and process it in your favourite analytics tool including Tableau, R or (even) Excel. For an introduction see here, to download it click here.

Extracting data from Google Analytics is a simple, 3 step process:

  1. Develop your query
  2. Run the query
  3. Import the results into an analytics tool

The three steps are described in more detail below.

Read the rest of this entry »

Introducing Google-Analytics-export-to-CSV: a fast, simple way to get your Google Analytics data into your favourite analytics programme

January 31st, 2012 by Yali

Download Google-Analytics-export-to-CSV, a free (open source), quick and simple tool for easily pulling data out of Google Aanlytics via the API.

Google Analytics contains of wealth of interesting data. Often, however, it makes sense to take the data out of GA and analyse it in a separate tool e.g. Tableau, R, Excel. There are a number of reasons why this is sometimes desirable:

  1. A number of analyses are hard / clunky to do in Google Analytics via the web UI. (Indeed a number are impossible.)
  2. Whilst it is generally impossible to join data in Google Analytics with other data sources (e.g. CRM systems), it is often desirable to compare graphs alongside others generated from different data sources. This is much easier if both sets of data are available in the same analytics tool

Because the tool uses Google’s Data Export API, it can extract much larger volumes of more detailed data than is possible using the web UI: up to 7 dimensions and 10 metrics with each pull. Further, if the query you run with it returns more than 10,000 lines of data (the limit returned by the API), the tool automatically makes extra calls to fetch the additional data and pop it in your CSV automatically, so CSV has all the data you require.

The Google-Analytics-export-to-CSV is a command-line tool we developed internally at Keplar to make it easy for us to grab data out of our own (or our client’s) Google Analytics account to enable us to perform more powerful analyses, faster. We are now making it available to everyone on the internet, for free, as an open source project.

In the next couple of days we plan to make the source code available on Github. In the meantime, if you are a data analyst hungry to get your Google Analytics data out into your favourite analytics tool, you can download it here.

———–

Update: the source code is now available on Github here: https://github.com/datascience/google-analytics-export-to-csv

A compiled version of Google-Analytics-export-to-csv is also available via Github here

Why big data matters to companies in retail and media. (A straightforward guide for business folk)

January 30th, 2012 by Yali

A downloadable version of this blog post (in PDF format) is available here

Introduction

The term “big data” is very much in vogue at the moment. In this white paper, we explore what big data means, what opportunities it presents to companies in the retail and media sectors and outline what companies need to do to take advantage of big data.

The purpose of this white paper is to provide an overview of the opportunities, challenges and success factors around big data. There is a lot to explore in each of the areas that we introduce: we plan to do this in subsequent white papers and blog posts, all of which will be made available on the Keplar website.

OK - so we should up our marketing spend on this customer segment?

OK - so we should up our marketing spend on this customer segment?

A little history

The idea that important decisions in companies should be data-driven is much older than big data. Toyota’s pioneering use of data to drive efficiency in their manufacturing process helped them to steal a march on their American competitors back in the 1970s. Fast forward to Nineties Britain, and Tesco’s pioneering use of customer data collected via their club card scheme (run by Dunnhumby) helped them to establish themselves as the largest supermarket in the UK and in the top ten retailers globally.
Read the rest of this entry »

2011 in retrospect: agile data analytics with Scala

January 4th, 2012 by Alex

Looking back, 2011 was the year in which the team here at Keplar ‘got our hands dirty’ and started writing code to answer some of our clients’ thornier business questions. In the sectors we focus on (online/offline retail, online advertising, digital products), clients often have access to very large data sets, and need help manipulating and analysing this ‘big data’ to understand their business performance, make strategic decisions and build better products and services. This new-found appetite for agile and ‘bottom-up’ analysis and decision-making contrasts strongly with the more ‘top-down’ approach (of business models, focus groups and desk research) traditionally favoured by management consultants.

One of the Keplar bookshelves (the Scala books are out on loan)

One of the Keplar bookshelves (the Scala books are out on loan)


Read the rest of this entry »

How publishers can develop and use audience data to drive ad revenue

October 3rd, 2011 by Yali

OpenX has just published the presentation we gave last week, on why and how publishers should develop audience data. The presentation is shown below:

This is a summary of the “Data=dollars” white paper we wrote for OpenX, published a few weeks ago.

There are a number of exciting opportunities for publishers interested in using their knowledge of their audience to drive improved ad revenue. We’ll be exploring some of these possibilities, and walking through a series of best practices, in a forthcoming blog series. Stay tuned!

Approaches to accuracy for Mechanical Turk

September 30th, 2011 by Yali

This is the third blog post in our series on using Amazon’s Mechanical Turk to build scalable business processes. Please see also our introductory post and our second post, getting started with Mechanical Turk.

Amazon’s Mechanical Turk provides a very convenient platform for getting large numbers of workers to perform manual steps as part of large scale business processes, such as cleaning data sets for use in machine-learning algorithms, or moderating content.

However, it is not enough for Mechanical Turk to provide results fast. The results themselves need to be reliable and hence it is critical that companies using Mechanical Turk invest in a suitable strategy for accuracy.

The mirror in the Hubble Space Telescope, the most precise ever made, was initially 10 nanometers off the correct curvature. The inaccuracy was catastrophic and cost several million dollars to fix

Amazon provides two primary tools for helping users validate the accuracy of results. We’ll look at these both briefly, before outlining a third technique which, used in combination with the first two, can be used to deliver a very rigorous approach to accuracy. These three strategies for accuracy are as follows:

Read the rest of this entry »

Smarter catalogue management through automation: a primer for online retailers

September 27th, 2011 by Alex

This post is the first in a Keplar series for online retailers, showing you how to automate your way to a more profitable and responsive e-commerce business. Get in touch to discuss how to apply these techniques to your company.

At Keplar we have just completed a “soup-to-nuts” project launching a new image-heavy e-commerce site in the lifestyle space; the retailer has launched with 100 SKUs (each with 17 product images) with a plan to grow its catalogue aggressively to 1,000+ SKUs over the next 6 months. With these sorts of numbers, catalogue management – especially around product imagery – starts to be a real headache and also potentially a significant time/cost sink for the business: even something as simple as updating the watermark on each image becomes a major untaking.

The headaches of manual processes

The headaches of manual processes

Off-the-shelf technology solutions to streamline these processes already exist – typically referred to as Master Data Management systems, the leader in the field is probably Hybris with their Hybris PCM (product content management) system. But the Hybris technology stack is designed for major retailers with very large catalogues and/or complex product lifecycles – and it is priced accordingly; there’s no real equivalent for smaller retailers who want a better (i.e. less manual) approach to catalogue management than that provided by their ecommerce package.

Read the rest of this entry »

Facebook’s Timeline: a masterclass in product vision

September 23rd, 2011 by Yali

The web is full of chatter as the world digests Facebook’s announcements at yesterday’s F8 developer conference of Timeline and OpenGraph.

Chris Coxs presentation at F8 is essential viewing for anyone in product management

Chris Cox's presentation at F8 is essential viewing for anyone in product management

The purpose of this post is not to summarise the developments or hypothesise on the implications: there are plenty of pundits doing that already. In this post we explore the importance of product vision to successful product development, and use Facebook’s Timeline as an exemplar of best practice.
Read the rest of this entry »

Getting started with Mechanical Turk

September 20th, 2011 by Yali

Amazon has done an excellent job of making Mechanical Turk very easy to use. It also provides great documentation to help users get started. The purpose of this post then is to provide a high level overview of how to:

  1. Conceptually to think about using Mechanical Turk
  2. Use the web UI Amazon provide to do the actual implementation

Define your HIT(s)

At the heart of every Mechanical Turk engagement is what Amazon calls a “Human Intelligence Task” or “HIT”. Each HIT is an independent unit of work.

As we mentioned in our last blog post, we have been using Mechanical Turk to check the language of a short content item. We already have an inkling what language each content item is, however we are only 70-80% sure that we are correct – so we use Mechanical Turk to get real people to check if each guess is correct.

In our case, then, each “HIT” consists simply of a worker checking the content and either confirming that the content is in the language we thought it was, or not. Notice this HIT has several important characteristics:

  1. It is a very simple to instruct. “Look at the below sentence. Is it in French? If so, click ‘yes’. Otherwise, click ‘no’.”
  2. It is independent. We have millions of sentences to check. However, the checking of each individual sentence is a completely independent task: there is no requirement that the person checking sentence A needs also to check sentence B. Hence it is possible that many hundreds or thousands of workers can work on the tasks in parallel, to ensure they are done quickly.
  3. It is repeatable. We can ask a number of different workers to perform the same task, and they should all give the same answer.  (This becomes important for ensuring the accuracy of results, because it means that we can verify the accuracy of individual tasks and individual workers.)

Read the rest of this entry »