Faking cohort analysis with Google Analytics

June 21st, 2012 by Yali

The cohort analysis blog post series

In the previous posts in this series on cohort analyses, we looked at what cohort analysis is, explored the wide variety of cohort analyses that are possible and walked through the steps necessary to perform them using SnowPlow. In this post, we look at how to perform cohort analyses in Google Analytics. As will quickly become apparent, Google Analytics is not well suited to performing cohort analyses. Hence, although this blog post will be useful to people who have to use Google Analytics to perform their cohort analyses, it should be more helpful to analysts identifying the advantages are of using SnowPlow alongside (or instead of) Google Analytics, and how those advantages derive from the fundamentally different approach SnowPlow takes to web analytics.

The wrong tool for the job

Read the rest of this entry »

What is wrong with web analytics in 2012? (And how SnowPlow starts to fix it)

June 11th, 2012 by Yali

The introducing SnowPlow blog post series

We developed SnowPlow out of frustration with the limitations of web analytics solutions today. We believe that there are many things that are wrong with today’s web analytics packages. In the last few weeks, demoing SnowPlow to different companies and discussing where web analytics falls short for them, we’ve seen that we’re not alone in believing that the web analytics world is in need of a shake-up. However, that view is by no means universal, so we thought we’d take the debate online, and explain our position here.

Web analytics solutions no longer support all the applications for web analytics data

Read the rest of this entry »

Different approaches to measuring customer lifetime value with SnowPlow

June 9th, 2012 by Yali

The cohort analysis blog post series

As part of our cohort analysis series, we have emphasized that there are a wide variety of different cohort analyses that are possaible, depending on the business question to be answered. To recap, just quickly, we can vary the cohort analysis by what metric we use to compare between cohorts, and by how we define our cohorts. We have written a post about comparing user engagement between different cohorts, and how this is valuable to especially to social networks, community sites and publishers. In this post, we look at comparing customer value, including customer lifetime value (CLV) between cohorts. We explain why this is important to all companies whose business models depend, at the end of the day, on monetizing users – including retailers, media companies and financial services companies. Lastly, we look at how to measure these values in SnowPlow, so that an appropriate cohort analysis can then be performed, as described in our previous blog post.

Weighing your customers' value

Read the rest of this entry »

(Re-)introducing SnowPlow: a new approach to web analytics

May 30th, 2012 by Yali

The introducing SnowPlow blog post series

In the last few weeks we have talked to many different people and organisations about SnowPlow. One of the things that has only become obvious since having those conversations is that the approach we’ve taken in developing SnowPlow is surprising to many of the people we’ve spoken to. That’s at least partly because the journey we’ve been on, in thinking about how web analytics can be done better, is not one that everyone else has necessarily been on.

The slide deck below is our attempt to describe that journey. It sums up, as briefly as we can manage, what is wrong with web analytics today, and how we believe SnowPlow addresses those fundamental problems in a fresh and new way. It goes on to outline some of the areas we hope to develop SnowPlow in.

I plan to write a blog post discussing some of these issues in more detail in the next few days. But in the meantime, check out the deck: we’d love your feedback.

Update: since writing this post we’ve launched a dedicated SnowPlow website. Check it out for more information on Snowplow.

Different approaches to measuring user engagement with SnowPlow

May 16th, 2012 by Yali

The cohort analysis blog post series

User engagement is one of the most interesting, most important, and yet challenging areas of data analysis. In this post, we will look at different metrics which can be employed to understand user behaviour as part of our overall series on cohort analyses. In addition, we will show how each suggested user engagement metric can be calculated in a straightforward way using SnowPlow.

Not an engaged user

Historically, it was online publishers who were most concerned with the level of their users’ engagement. But the advent of social networks like Facebook and Twitter, community sites (like Mixcloud and Mumsnet) and socially-aware applications (like Spotify and Steam) means that there is now a much larger number of businesses whose success depends very directly on how good they are at getting users to engage frequently (many times per day or month) and deeply (long sessions, many page views). That may be because engagement directly drives revenue (e.g. for an ad-funded business), or because building up a critical mass of users is key to building a viable online marketplace or social network.

Read the rest of this entry »

On the wide variety of different cohort analyses

May 16th, 2012 by Yali

The cohort analysis blog post series

In the last two blog posts in this series we looked at two different examples of cohort analysis and how to perform them using SnowPlow. The first example was taken from a Twitter case study, while the second example was taken from Eric Ries’s book The Lean Startup.

A wide variety of cheeses

In this post, we cast the net wider, to bring out the breadth of cohort analyses that you can potentially perform. As should become clear as you read this post, we don’t see “cohort analysis” as a single report that looks the same for every company: rather it is a whole category of analyses which can be brought to bare to answer a number of different business questions. As a result, the specific cohort analyses you conduct will depend on the exact nature of your business and the specific questions you need to answer. But for any one business, there will be a number of cohort analyses that will be relevant to answer a range of different business questions. So, if for now you’re only using one type of cohort analysis, we suggest you look creatively at how to employ this powerful technique to answer other business questions you face.

Below we outline the range of cohort analyses which are possible, outlining different metrics which can be employed to compare the behaviour of customers in different cohorts, before going on to outline some different ways of defining cohorts. In subsequent posts, we will dive into some of these variations in more detail, and explain how to perform those analyses using SnowPlow.

Read the rest of this entry »

Performing the cohort analysis described in Eric Ries’s Lean Startup using SnowPlow and Hive

May 15th, 2012 by Yali

The cohort analysis blog post series

This blog post is the third in our series on cohort analyses using SnowPlow. In the first post, we provided an overview of cohort analyses: why they are so powerful and what are the different analytic steps necessary to perform a cohort analysis. In the second post, we looked at why SnowPlow is such a good platform for performing cohort analyses using web analytics data, and worked through the specific example of the Twitter cohort analysis that gets so much attention in startup circles.

In this post, we will follow up with a look at another famous example of a cohort analysis: this time from Eric Ries’s excellent book The Lean Startup. We will show how a company running SnowPlow can easily perform the type of analysis Eric performed when he was CTO at IMVU, to assess the progress they were making towards achieving a product-market fit.

A version of the data Eric Ries presents in his book is shown below:

Read the rest of this entry »

Online merchant using PrestaShop? Announcing the SnowPlow Early Access Programme (EAP)

May 14th, 2012 by Alex

Are you an online retailer using PrestaShop? Are you interested in getting early access to killer new analytics tools to help super-charge your business? Keplar’s SnowPlow team would like to hear from you.

SnowPlow Security holds the line

At Keplar we are now hard at work building sophisticated web analytics, such as cohort analyses, using our open-source SnowPlow technology stack (now available on GitHub). So far, all of these analyses are being built on top of the “eventstream” data collected from a client’s website using the SnowPlow JavaScript tag installed across all pages.

Alongside these “eventstream”-based analyses, we are designing another set of analyses – equally powerful – based on the transactional data which lives inside your ecommerce platform; we have chosen to focus on PrestaShop first, because we have already developed and open-sourced a tool for fetching transactional data out PrestaShop…

Read the rest of this entry »

Open-sourcing symfony2-paypal-ipn, a Symfony bundle for PayPal IPN

May 13th, 2012 by Alex

Today we are pleased to announce the open-sourcing on GitHub of our new PayPal e-commerce library for Symfony 2. This is a direct port of our CodeIgniter PayPal IPN library which we open-sourced on this blog some 14 months ago.

At Keplar we remain committed to using open-source projects where possible to keep costs down for our clients and to avoid “reinventing the wheel”. Where high-quality open-source projects exist which meet our client’s needs, we use them by default; there are too many of these to name them all, but recent projects would have been impossible without Hive (Hadoop ecosystem), Spray (Scala/Akka), DictShield (Python), WAI (Haskell) and of course Redis.

Where open-source projects do not exist that meet our requirements, we are increasingly looking to develop those tools in-house and open source them where possible (i.e. where they are not part of a client deliverable). Our biggest initiative so far in this area is the SnowPlow web analytics platform, which since its soft-launch in February is already being used externally by one ad network to track ad impression data and internally by our team to perform some sophisticated analytics, such as website cohort analyses.

Other projects we have open-sourced since our CodeIgniter PayPal module include a Scala client for the Amazon Product Advertising API, a command-line tool for exporting Google Analytics data to CSV flatfiles, and a Scala client for the PrestaShop e-commerce API – the latter another release under our “Orderly” initiative for better e-commerce workflow automation and data analysis.

Onto our new Symfony2 library for PayPal IPN…

Read the rest of this entry »

Performing cohort analysis on web analytics data using SnowPlow

May 8th, 2012 by Yali

The cohort analysis blog post series

In the previous blog post in this series, cohort analysis for digital businesses: an overview we described what cohort analysis is and why it is so powerful. In this post, we will look at how to perform cohort analysis on web analytics data in SnowPlow. We will start with an overview of the general methodology and approach for cohort analysis using SnowPlow, and then launch into a specific example analysis: the Twitter engagement example that made cohort analysis so famous in startup circles.

Methodology for performing cohort analyses in SnowPlow

SnowPlow has been designed to:

  1. Make it easy to perform specific cohort analyses
  2. Give users maximum flexibility to perform a wide range of cohort analyses, by making it easy to define cohorts in multiple ways and leverage multiple different metrics to measure and compare between the different cohorts

To understand what makes SnowPlow so suitable for cohort analyses, we need to consider the way data is structured in SnowPlow. This is represented in the diagram below:

SnowPlow records all data in a single events table in Hive. Whenever one of your customers does anything on your website, be it click on a link, fill in a web form, play a video, add a product to basket, perform a search or roll-over an ad (to give just some examples), a line of data is generated in the events table.

Read the rest of this entry »