The SnowPlow technical blog post series
- Introducing SnowPlow: the world’s most powerful web analytics platform
- SnowPlow update: first part of source code published
- Warehousing your online ad data with SnowPlow
- Stay tuned… More to come!
Download the SnowPlow brochure
What is SnowPlow?
SnowPlow does three things:
- Identifies users, and tracks the way they engage with one or more websites
- Stores the associated data in a scalable “clickstream” data warehouse
- Makes it possible to leverage a big data toolset (e.g. Hadoop, Pig, Hive) to analyse that data
That sounds like a web analytics tool. There are many of them. (And many of them are free, including Google Analytics.) Why build a new one?
We built SnowPlow for our own web analytics purposes. We recognise that we are unusual web analysts, for a number of reasons, but for us, there are a lot of frustrations with the different solutions that are currently available:
1. We want access to atomic, customer-level and event-level data
Google Analytics only ever provides a rolled up, aggregate view on your web analytics data. That is fine a lot of the time, but there are a number of situations where you want to drill down to individual user level data, for example:
- Linking web analytics data with other data sources, e.g. CRM systems, social media platforms, finance / transaction systems, email marketing systems. When we perform analyses for our clients, we often want to build up a complete picture of a particular user based on data stored on them in multiple systems. This means that we need a consistent way of identifying them across different systems and joining the data – which is only possible if we have access to individual visitor-level web analytics data
- Segmenting and targeting users. Tools like Google Analytics make it possible to see how site visitors segment by behaviour, but they don’t then give us the opportunity to take action based on that segmentation, for example retargeting users who have abandoned shopping carts via display ads or support emails. Because SnowPlow gives us access to individual customer identifiers, those same identifiers can be used as a basis for delivering personalised offers and services
2. We want to be able to track user behaviours over  multiple visits (i.e. a complete view of each customer over his / her lifetime)
99% of the analysis we do is focused on better understanding our clients’ customers and using that understanding to enable our clients to deliver a superior service and grow customer lifetime value. That means that looking at behaviour in a single visit, or breaking down behaviour by visits, is often the wrong foundation for building an understanding: we want to break our data down by customers, and see how customer engagement evolves over time across multiple website visits. Although Google Analytics does now provide some nice customer journey visualisation tools, it does not let us track customers across more than one visit.
Some of the benefits of SnowPlow from a data perspective:
In what situations does SnowPlow help me, where a traditional web analytics program like Google Analytics fall short?
There are a number of tasks that are hard to perform using traditional web analytics tools, and easier with SnowPlow. For example:
- Customer journey analysis: identifying the value of different events in a customer’s lifetime
- Affinity analysis: analysing the likelihood that if a customer is interested in service A, they will also be interested in service B, and understanding to what extent that is a function of customer segment vs website architecture
- Behavioural segmentation: breaking down your customer-base by their on- and off-site behaviour, and using that to inform your marketing and product development approach
- Product analytics: identifying where your website is and is not performing well, and quantifying the value of improvements over time
- Linking web analytics to social media data: seeing which of your site visitors are the most vocal on social media, and optimising the way in which you engage with them
So should I throw out my Google Analytics installation and start using SnowPlow?
No! Google Analytics et al do an excellent job of making a whole host of web analysis very easy and straightforward, especially for people who are not well versed in more advanced statistical techniques and mathematical tools.
SnowPlow serves a different need: it makes much more involved analysis – which is either very hard or impossible with existing tools – possible. SnowPlow is complementary to Google Analytics, Omniture, Piwik or whatever other tool you currently use. In other words: use Google Analytics for the easy stuff, and SnowPlow for the tough stuff.
Two years after Eric T Peterson’s seminal post The Coming Bifurcation in Web Analytics Tools, we finally have a web analytics tool which is powerful enough for all the “hairy” analyses that a hardcore clickstream data-scientist would want to perform:
You’re claiming that SnowPlow is the world’s Most Powerful web analytics platform. That’s pretty bold. Can you substantiate it?
We should be clear not only about what SnowPlow does, but also what it does not do. At the time of writing, SnowPlow does not provide any user interface tools to make performing analyses easy. In fact, to perform analyses, you need to use Big Data technologies like MapReduce, Pig and Hive.
If you are familiar with those technologies, SnowPlow will let you do anything you want with your data. That is why we call it the Most Powerful. Whatever the volume of data you collect, however complicated your query is, by employing these heavy-duty, massively parallel processing tools, SnowPlow will enable you to perform that analysis. But it may not be easy – and it may not be pretty! SnowPlow is a web analytics platform for data scientists and other hardcore data geeks.
I already use Hadoop to process my web server logs. What is the benefit of using SnowPlow for me?
A lot of interesting insight can often be mined from web server (Apache, Nginx, IIS et al) logs. However, there is a lot of important data that web analytics systems capture (via JavaScript tags) that is simply not available in web logs. By using JavaScript tagging to generate the raw data, SnowPlow gets access to data that is absent from most web logs including:
- Accurately identifying individual users (via cookies)
- Enabling users to configure custom events and variables to specific user journeys e.g. adding / removing items from a shopping basket
In a world were consumers are getting increasingly nervous about data collection, should you be distributing tools to enable more granular data collection?
Many consumers are rightly concerned about their online data.
By its very architecture, SnowPlow gives the companies that use it complete control of all of the data collected in their own clickstream datawarehouse. We always recommend that companies exercise that control to protect their user’s data: we recommend not to share that data with third parties without permission, and to delete that user data from their systems when requested. Because companies that employ SnowPlow have complete control over their own data, and complete flexibility to do with it what they want, they are much better placed from a consumer privacy perspective than, for example, companies that depend on Google Analytics, who have limited control over what Google does and does not do with their data.
Will SnowPlow be open sourced? How will your charge for it?
It has only been possible for us to build SnowPlow thanks to the hard work of the open source community in developing great tools including Piwik, Hadoop and Hive. It only seems right, therefore, that we contribute SnowPlow back to the open source community. We intend to make it available, freely, from our GitHub SnowPlow page, in due course (May update: we have started uploading code) – but necessarily there is a lot of packaging, documentation and test suite work to be done prior to this.
Our intention is not to charge for the SnowPlow software, but rather for Keplar to offer consultancy and services around helping companies to setup SnowPlow and using SnowPlow to drive business value.
What does SnowPlow look like? Technically?
Where can I get SnowPlow?
Until we put SnowPlow on GitHub, you’ll need to contact us to get hold of SnowPlow.
Where can I learn more?
Please download the SnowPlow brochure, and get in touch with Keplar if you want more details.
————————————————–
Update!
Since publishing this post we have made SnowPlow available on Github, including technical documentation, and have started to document and explain SnowPlow on the Keplar website.
