The cohort analysis blog post series
- Cohort analyses for digital businesses: an overview
- Performing cohort analysis on web analytics data using SnowPlow
- Performing the cohort analysis described by Eric Ries in the Lean Startup
- On the wide variety of cohort analyses
- Approaches to measuring user engagement as part of cohort analysis
- Approaches to measuring customer value as part of cohort analysis
- Faking cohort analysis with Google Analytics
In the previous posts in this series on cohort analyses, we looked at what cohort analysis is, explored the wide variety of cohort analyses that are possible and walked through the steps necessary to perform them using SnowPlow. In this post, we look at how to perform cohort analyses in Google Analytics. As will quickly become apparent, Google Analytics is not well suited to performing cohort analyses. Hence, although this blog post will be useful to people who have to use Google Analytics to perform their cohort analyses, it should be more helpful to analysts identifying the advantages are of using SnowPlow alongside (or instead of) Google Analytics, and how those advantages derive from the fundamentally different approach SnowPlow takes to web analytics.

The wrong tool for the job
When performing a cohort analysis, we need our web analytics to enable us to:
- Report the metric we want to compare between the cohorts
- Report the metric value for each of our cohorts
- Compare the metric value for each of our cohorts alongside each other
1. Reporting the metric we want to compare between cohorts
Google Analytics provides analysts with a decent set of metrics to report on, including all the usual suspects:
- Visits
- Unique visitors
- Page views
- Pages per visit
- Bounce rate
- Conversion rates
That’s not a bad set of metrics, but it’s not nearly as wide a set as that provided by SnowPlow. As we’ve shown in previous posts, SnowPlow offers analysts a lot of flexibility to define metrics to measure important things like user engagement and customer lifetime value. Both of these are very hard to approximate using the more limited set of metrics Google Analytics provides. (For example, the best we can do on “engagement” is to look at the number of pages per visit, or the fraction of users that reach a particular point in a pre-defined funnel. We certainly cannot develop our own models of engagement that assign different values for different user actions, and sum them to get a score for each of our users, as we can with SnowPlow.
2. Enable us to define our cohorts, and report the metric above for each cohort
There is a wide range of different cohorts we might want to define in different circumstances:
- Which marketing channel a customer was acquired on (useful, for example, if we want to compare the return on marketing investment between different channels, with a view to optimising ad spend going forwards)
- Which month a person signed up to a service (useful if we want to measure improvements in how effectively we are converting customers over time, as in the lean startup example.)
- Customer profile data e.g. gender, age (useful if we believe that customer behavior varies as a function of demography, and we want to examine how successfully we are serving different customer segments)
Google Analytics provides analysts with the ability to report on a range of subsets of the total userbase including:
- Where the user is based (geography)
- Whether this is a new visitor, or someone who has visited the site before
- What type of browser the user is running
- What traffic source drove the user to the site on this occasion
Google recently upgraded Analytics to provide users with the ability to define “advanced segments” of visits that meet multiple criteria, giving analysts improved flexibility to define their own cohorts:

That functionality is great, but it has a one critical limitation from an analysis perspective: we can only segment users based on variables related to this particular visit (i.e. this session). We cannot segment users based on their past behaviour – i.e. what happened to them on previous visits. To take two examples:
- We cannot segment users based on when they started using our service, because that requires looking up the date of their first visit, which relates to a previous user session.
- We cannot segment users based on whether they have seen a particular ad on their customer journey, because Google will only enable us to query which ads they have seen on this particular visit. So if a customer was originally acquired through a particular PPC campaign, but visited the site most recently by coming to it directly, we cannot use that when defining our cohort.
The above problems both stem from the same design feature in Google Analytics: namely that “visits” are the primary units of analysis, with Google only providing us with limited tools to analyse an individual’s behaviour across multiple customer visits. That is a big drawback when you’re doing cohort analysis, because however a cohort is defined, the customer is always the primary unit of analysis. You may define your customer segment based on the customer’s behaviour on a particular visit, but limiting yourself to the most recent visit make the vast majority of cohort analyses very tricky with Google Analytics.
There is a way to work around the above limitation, however. If we have a particular variable that we want to use in a cohort analysis, we need to assign it to the user in our own web platform (e.g. CMS or ecommerce package), and then pass that information to Google Analytics every time the user visits our site, by setting a custom variable in the JavaScript. This, like it sounds, is not trivial. It is the basis for the solutions to performing cohort analysis using Google Analytics proposed by Dan Hill and Matt Clarke.
3. Comparing the metric value for each of our cohorts alongside each other
In an ideal world, we would want Google Analytics to present us with a plot of our metric against different cohorts:

Unfortunately, Google wont give us our data in quite that format. Performing the cohort analysis then, is a multi-step process:
- Go to the advanced segment interface, and either select a segment that corresponds to your first cohort (if the segment already exists), or create a new custom segment if not.
- Once you have selected the appropriate segment, Google will only report results for this particular cohort. Now navigate to the metric you want to measure for this cohort, using the options in the left hand menu. Chances are that you will want to look at Overview for the standard metrics (visits, page views etc.), Behaviour -> engagement to measure visit duration or Conversions to measure propensity of each cohort to convert.
- Now that you have your value for the particular cohort, record that. (Probably most easily done by downloading the relevant report in CSV format.)
- Repeat the above steps for each different cohort.
- Collate the results in Microsoft Excel (or equivalent).
Google Analytics: not the best tool for cohort analysis
As we’ve seen above, it is possible to perform cohort analyses with Google Analytics. However, it is not easy. In particular, two limitations stand out:
- It is not possible to define cohorts in Google Analytics based on any user data, only data that is associated with the user’s most recent visit
- Results from different cohorts have to be manually collated: Google will not present them alongside each other (although this pain point can be made automated using Google’s API, once the analyst has defined the segments in the web UI)
In contrast, SnowPlow gives analysts the ability to create segments based on any user data, and returns the complete result set (for all relevant cohorts) in a single table.
Interested in finding out more about how SnowPlow can empower analysis at your company? Then find out more from the SnowPlow website, SnowPlow wiki, or get in touch.
