What is wrong with web analytics in 2012? (And how SnowPlow starts to fix it)

June 11th, 2012 by Yali

The introducing SnowPlow blog post series

We developed SnowPlow out of frustration with the limitations of web analytics solutions today. We believe that there are many things that are wrong with today’s web analytics packages. In the last few weeks, demoing SnowPlow to different companies and discussing where web analytics falls short for them, we’ve seen that we’re not alone in believing that the web analytics world is in need of a shake-up. However, that view is by no means universal, so we thought we’d take the debate online, and explain our position here.

Web analytics solutions no longer support all the applications for web analytics data

What’s wrong with web analytics in 2012?

Too narrowly focused

Web analytics tools are narrowly focused on:

  1. Marketing-related metrics (sources of traffic, visit numbers over time)
  2. Ecommerce and online retail sites (clickthroughs, conversions, tracking a flow through a pre-defined funnel)
  3. Conventional online publishing sites (page views by web page)

Web analytics tools have not been developed with the wide range of digital businesses that are active today online, including an enormous number of B2B SaaS services, online games companies (including social and massively multiplayer games), financial services companies, media broadcasters and social networks, to name just a few categories. These companies are incredibly poorly served by today’s web analytics solutions.

It is not just whole industries which are poorly served by today’s web analytics options: the current focus of web analytics packages on marketing activities means that even in the target industries (i.e. online retailers and conventional online publishers), it is hard for anyone outside of Marketing to use the tools – for example:

  1. Product managers find it hard to effectively use web analytics tools to understand how users engage with their website, how new versions of the website improve engagement and how that improved engagement drives improved commercial outcomes. Worse, it is all but impossible to use current tools to spot where the product is currently not working (e.g. where users are getting “stuck”, and so where product development efforts should focus)
  2. Developers find current web analytics packages far too crude to perform the type of event-level inspection and monitoring to understand how their applications are standing up to real-world use
  3. Customer relationship managers typically do not use customer web analytics data to improve their knowledge of individual customers and develop customer segmentations

Inflexible

One of the most things that surprises us most is that web analytics packages are as a whole, highly inflexible. It is not easy to use them as a basis to build custom reports – reports that are not simply “recuts” on a subset of page views, visits and unique visitors. For example, try building a report to compare the lifetime value of users acquired from one marketing channel with another, or looking at how user engagement varies by when the user signed up to a service.

This is surprising because on the whole, companies are getting much better at building tools that developers and data scientists can take and extend to suit their own needs. Compare the growth in the number of libraries available for analysts using R or Mahout, as examples of analytics products that have been developed by a wide and disprate community of analysts. What kind of comparable developments have we seen in web analytics? Google’s Analytics API has been available for ages, but all it has been used for is to export cuts of data and come up with sexy visualisations of dubious analytic value. That is not the fault of the community around the product: Google’s Analytics API has simply not been built to enable the kind of analytic innovation we’re looking for, because data is only exposed in an aggregated form, limiting the scope of data scientists and other programmers to reimagine it used in very different types of analysis.

Too high-level

As mentioned above, web analytics packages typically only provide aggregated data. They do not make data available about individual users, making it hard to use web analytics data to diagnose what is going wrong or right with a particular customer journey, let alone to use that data to improve and personalise that journey in the future.

Too low-level

It should not be possible for web analytics tools to be both too high-level and too low-level, but today’s vendors have managed just that. Opening up a web analytics package, analysts are presented with a sea of metrics: hundreds of carefully collated numbers, very few of which actually matter for business success or decision making. By drowning us in “vanity metrics”, the current crop of web analytics products make it hard for us to see the wood for the trees.

Siloed

It is several decades since Ralph Kimball argued that companies should be drawing data from multiple sources together in a single repository so that analysts could use that data to derive insights. Those compelling arguments for warehousing data have been accepted by the whole analytics community. Sadly web analytics tools have not been built to facilitate this. Web analytics data still typically sits outside of company data warehouses and BI systems in its own impregnable silo. Because of this, web analytics data is rarely incorporated as part of a single customer view – which is shocking given how much modern companies engage with their customers via the web.

What are the consequences for businesses?

The consequences for business are simple to articulate: business questions that should be possible to answer using web analytics data are not. We divide those questions into two categories, customer analytics and product analytics:

Customer analytics

How effectively a company understands their customers, and uses that understanding to drive customer loyalty and value, is a key determinant of success in a wide range of industries. For just one example, look at how Tesco’s use of Clubcard data via Dunnhumby helped propel it from the second largest supermarket in the UK to a global behemoth not far behind Walmart. Web analytics should provide a wealth of detailed customer data based on the way those customers engage with our online products and services. Sadly, web analytics’ relentless focus on visits and page views has totally obscured the customer. The following customer-level questions are all very hard to answer using web analytics data, yet these are basic questions that every consumer-facing business should be asking and answering:

  1. Who are our most valuable customers?
  2. How can I spot those valuable customers in advance? (What are the key predictors of value?)
  3. What are the “sliding doors” moments that move a customer from a less valuable to a more valuable segment, or vice-versa?
  4. How should we break down our customer-base by behaviour? How do segments vary, by value?
  5. How well do we serve each customer segment? Are some (potentially valuable) segments less well served than others? What parts of the product should we focus on developing to improve the service level for those segments?
  6. What are the best opportunities for growing the value of my customer base?

Any company with an online presence should also be interested in using web analytics data to measure help make robust product development decisions. With the current crop of tools, however, this is very difficult: current tools enable A/B and multivariate testing but little else. In particular, it is hard to use today’s web analytics tools to answer:

  1. How successful has each product iteration been at driving user engagement?
  2. Are there parts of our customers’ journeys that are better served by the product than others? What parts of the journeys need improving?
  3. Where should we focus our product development efforts?

How does SnowPlow enable businesses to answer the above questions?

For us, the key to empowering data scientists and analysts to answer the above questions is to start by giving them access to the underlying raw (aka “atomic”) data, in a straightforward format that makes it easy to query.

The strength of this approach is that it gives analysts maximum flexibility to take the data, query it directly, and use the most appropriate analytics tools to perform the analyses that suit the companies, data sets and business questions they work with. For example measuring customer lifetime value for a bank is going to look very different to a telco, which will also look different to a retailer.

Rather than spend time developing our own reporting functionality to meet all the potential analyses that you might want to perform with web analytics data, we have developed a platform that gives analysts the atomic, customer-level data and lets them use whichever tools they believe will most effectively work on the data to meet their needs.

The focus of our approach, then, has been on making it easy for companies to collect all f the possible data they can from their web analytics system, and store it in the cloud in an infrastructure that scales effortlessly with enormous data volumes.

Analysts can then use the wide variety of general analytics tools that are available. Unlike web analytics packages, these have been developing very rapidly. They include:

  1. Statistical and modelling tools e.g. R
  2. Slice-and-dice OLAP technology e.g. Microstrategy, Tableau
  3. Behavioural database technologies e.g. Skylab
  4. Machine-learning and data mining tools e.g. Weka, Mahout

Making the underlying data available is not enough

Making the underlying data available to analysts is a big step in the right direction, but it is not enough to fix web analytics. Fortunately, SnowPlow has a number of other key features:

Powerful, scalable, flexible analytics with Apache Hive

Hive was developed at Facebook to enable analysts there to perform very involved analyses of how Facebook’s users engage with the product. Built on top of Hadoop, it is incredibly scalable. It is also very flexible, with a framework that lets analysts develop their own functions to use as part of their queries. Best of all, it is easy to use for any analyst with passing knowledge of SQL.

The ability to upload data from other sources into SnowPlow

It is straightforward to add additional data sources into SnowPlow and perform analytics against the combined data set. CRM data, data from social networks or even analyst-generated data (e.g. the values of different SnowPlow “events” based on models developed from SnowPlow data using other analytics systems like R) can be uploaded into SnowPlow and joined with SnowPlow’s own data. This makes SnowPlow an ideal place to assemble a “single customer view” which incorporates the customer’s all-important web behaviours, helping you to answer business questions that require analysis across a range of customer-data sources.

More to come

There’s a lot more work to do to take web analytics to where it should be in 2012. The areas we are actively working on to develop SnowPlow functionality include:

  1. Developing connectors to pipe SnowPlow data into analytics databases, to enable faster train-of-thought analysis and lower-cost repeated querying
  2. Designing connectors to pipe SnowPlow data into behavioural databases, and develop a compelling analytic toolset around behaviour and eventstream analysis
  3. Building additional clients to generate SnowPlow event data from mobile apps, Flash games, desktop apps and so on
  4. Developing tools to enable less technically-savvy analysts to get more value out of the SnowPlow data
  5. Designing a range of tools to enable companies to use web analytics data from SnowPlow in real-time operational systems e.g. product recommendation, content personalisation

Interested in learning more about SnowPlow?

Visit the SnowPlow Github repository, check out the code and wiki. Or if you prefer you can send us an email.

Interested in joining us on this journey?

We see a revolution coming in web analytics. We are only just beginning to scratch the surface of what is possible with customer event-data. SnowPlow is an open source project, and we encourage anyone with an interest in using web analytics data in novel and interesting ways to work with us to develop the SnowPlow platform and the analysis methodology and toolset around it. Start off by checking out the SnowPlow repo on Github and the new SnowPlow Analytics website

Note – the arguments presented here are much the same as those we gave in our presentation Re-introducing SnowPlow: a New approach to Web Analytics. Here, we describe them in long-form.

Leave a Reply