<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Keplar LLP blog &#187; Search &amp; aggregation</title>
	<atom:link href="http://www.keplarllp.com/blog/category/search-aggregation/feed" rel="self" type="application/rss+xml" />
	<link>http://www.keplarllp.com/blog</link>
	<description>Blogging from the team at Keplar LLP</description>
	<lastBuildDate>Wed, 01 Feb 2012 12:54:20 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Towards a curated web: why vertical search is a potential Google killer</title>
		<link>http://www.keplarllp.com/blog/2010/05/towards-a-curated-web-why-vertical-search-is-a-potential-google-killer</link>
		<comments>http://www.keplarllp.com/blog/2010/05/towards-a-curated-web-why-vertical-search-is-a-potential-google-killer#comments</comments>
		<pubDate>Mon, 24 May 2010 14:11:08 +0000</pubDate>
		<dc:creator>Alex</dc:creator>
				<category><![CDATA[Search & aggregation]]></category>
		<category><![CDATA[affiliate marketing]]></category>
		<category><![CDATA[aggregation]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[intelligent agent]]></category>
		<category><![CDATA[semantic web]]></category>
		<category><![CDATA[vertical search]]></category>

		<guid isPermaLink="false">http://www.keplarllp.com/blog/?p=604</guid>
		<description><![CDATA[In the last post in this series we looked at how vertical search sites run by affiliate marketers are changing the shape of the purchase funnel for goods and services online. We compared this &#8220;funnel 2.0&#8243; to the old approach, in which surfers would try to zero in on the best available product and retailer [...]]]></description>
			<content:encoded><![CDATA[<p><img class="aligncenter" title="Small boat outmanouvering a supertanker" src="/blog/wp-content/uploads/2010/04/super-tanker-little-boat.jpg" alt="" /></p>
<p>In the <a href="/blog/2010/05/towards-a-curated-web-affiliate-marketing-and-the-new-shape-of-the-customer-purchase-funnel" target="_blank">last post</a> in this series we looked at how vertical search sites run by affiliate marketers are changing the shape of the purchase funnel for goods and services online. We compared this &#8220;funnel 2.0&#8243; to the old approach, in which surfers would try to zero in on the best available product and retailer through multiple general searches. At the end of the article we introduced the idea that this new funnel threatens the AdWords model which powers Google&#8217;s search business.</p>
<p>In this post we want to explore further Google&#8217;s uneasy relationship with vertical search, and in particular the extent to which vertical search poses a significant threat to the business model for general search. At Keplar we believe that there&#8217;s an opportunity emerging for a new entrant to massively disrupt search as it exists today, and I discuss this in the article as well.</p>
<p>Google&#8217;s attitude to vertical search is a complex one. On the one hand, we continue to hear <a href="http://www.tnooz.com/2010/05/18/news/priceline-no-fear-and-loathing-about-google-ita-software/" target="_blank">rumours</a> that Google will spend a billion dollars buying travel search engine ITA Software, which powers Kayak.com, Orbitz and others. On the other hand, Google regularly bans affiliates from advertising their vertical search sites using Google AdWords: complaints about these mass bannings can be found <a href="http://www.adhustler.com/google-hates-affiliates/" target="_blank">all</a> <a href="http://www.seroundtable.com/archives/021178.html" target="_blank">across</a> <a href="http://www.dansoha.com/2009/11/googles-bande-listing-of-advertisers-revisited/" target="_blank">the</a> <a href="http://www.webmasterworld.com/google_adwords/4020049.htm" target="_blank">Web</a>.</p>
<p>What&#8217;s behind this difficult relationship with vertical search? To understand the reasons, we first need to understand the commercial dynamics of general search, and understand Google&#8217;s hardening commitments to that business model.</p>
<p><span id="more-604"></span></p>
<p><strong>The commercial dynamics of general search</strong></p>
<p>General search is a key navigation and discovery tool for the Web, and it has become increasingly dominant over time: growing numbers of users are abandoning browser bookmarks and the address bar altogether in favour of the power of the &#8220;Google command line&#8221;. But that utility comes at a massive cost: general search is dependent on Google (and of course Bing et al) crawling the entire known Web, storing and indexing all of that data and then returning it in any relevant search query.</p>
<p>These search results are costly to provide, and moreover most of this traffic is not monetisable. Whether I&#8217;m searching for help with a programming question, typing &#8220;bbc.co.uk&#8221; into the Google search box as a URL short-cut, or simply checking today&#8217;s weather &#8211; none of these will generate any revenue for Google.</p>
<p>The searches which are monetisable are the ones related to products and services which the user may want to buy online &#8211; for example, a t-shirt, or holiday flights, or a new car. The trouble though is that general search is a blunt tool for helping people make decisions about what to buy, especially compared to a vertical search engine which exists only to help people make decisions about a certain product category. So it&#8217;s no coincidence that if you search for <a href="http://www.google.co.uk/search?q=car+insurance">&#8220;car insurance&#8221;</a> on Google UK, the vast majority of the search results (both organic and paid-for) are for vertical search sites. Those sites are great at converting searchers into purchasers, they are well rewarded by merchants as a result, and thus are able to outbid (and out-link) competitors to come top of Google&#8217;s search results. We explored this &#8220;capability gap&#8221; between generalised and vertical search further in <a href="/blog/2010/05/towards-a-curated-web-affiliate-marketing-and-the-new-shape-of-the-customer-purchase-funnel" target="_blank">our last post</a>.</p>
<p>So to recap: general search is expensive to offer, the majority of it is unmonetisable, and the piece that is monetisable (products and services) is more effectively monetised by vertical search engines run by domain experts. If this is the situation, what kind of threat does this pose for Google, and what opportunities does it provide for other players?</p>
<p><strong>Google&#8217;s commitments to general search are hardening</strong></p>
<p>Don Sull, a professor of management at London Business School, has developed a powerful theory to explain why large corporations so often get stuck in a rut and find themselves outmanouevered by competitors and new entrants. Don has identified what he calls &#8220;active inertia&#8221;, in which a company&#8217;s hardening commitments clash with turbulent markets and bring about its decline in three acts. His theory is laid out in an <a href="http://www.ft.com/cms/s/0/56c9b7e6-4c56-11de-a6c5-00144feabdc0.html?catid=41&#038;SID=29854c38b296aa2103f92bbf87a2e532" target="_blank">FT article</a> which is well worth reading, but in summary the three acts are as follows:</p>
<ol>
<li><b>Act 1: Managers commit</b> &#8211; the company&#8217;s management makes long-term commitments, such as public declarations around areas of focus, or investing in specialised resources</li>
<li><b>Act 2: Commitments harden</b> &#8211; these long-term commitments engender an attitude of &#8220;if it ain&#8217;t broke, don&#8217;t fix it&#8221;. The commitments establish a trajectory for the company, and all functions of the company start optimising themselves around this trajectory, to speed up execution against the long-term commitments</li>
<li><b>Act 3: Active inertia</b> &#8211; shifts in the external business environment render the company&#8217;s existing commitments outmoded or obsolete, and this changed environment stops rewarding progress along the established trajectory. The company typically responds by accelerating the activities that succeeded in the past, but this merely digs them deeper into the rut</li>
</ol>
<p>It is the Keplar view that this &#8220;active inertia&#8221; theory applies as well to Google&#8217;s general search business, putting Google into the category of technology companies who maintain significant &#8220;legacy&#8221; platforms, including Microsoft (Windows, Office), eBay and, increasingly Apple (iPhone/AppStore). Make no mistake &#8211; these legacy platforms confer significant advantages on the incumbents (including significant revenue generating potential and partner lock-in), but they also significant restrict the flexibility to innovate in the face of changing business environments and disruptive innovations.</p>
<p><strong>Google&#8217;s active inertia in search</strong></p>
<p>So, Google has won the war of general search &#8211; but as we have suggested above, along with this victory comes a set of &#8220;hardening commitments&#8221; (to use Don Sull&#8217;s language) around supporting a hard-to-monetise utility search platform, even while affiliate marketers reshape the purchase funnel and capture significant value for themselves.</p>
<p>To date, Google&#8217;s response has been to build or buy search capabilities in key verticals (such as real estate, video and travel), but this approach appears to us to have two key weaknesses:</p>
<ol>
<li><b>Ignoring the long tail of verticals</b> &#8211; there is a significant long tail of verticals out there, from cheese to vintage motorbikes to music festivals. However deep its pockets, Google cannot hope to provide vertical search functionality for all of these</li>
<li><b>Preventing the survival of the fittest</b> &#8211; Google&#8217;s AdWords model works because it creates a competitive environment in which the best advertisers win out (because they can afford to out-bid their rivals). Conversely by maintaining its own monoculture for vertical search &#8211; rather than letting multiple affiliates compete for each vertical &#8211; Google may ultimately deliver a poorer search experience for each niche. At the very least, it could create mistrust that Google will treat vertical search competitors fairly, as reflected in the <a href="http://www.nytimes.com/2010/05/23/technology/23goog.html" target="_blank">Foundem case</a></li>
</ol>
<p>In our view, Google&#8217;s current approach to vertical search reflects the &#8220;active inertia&#8221; problem identified by Don Sull: the changing search environment is no longer rewarding Google&#8217;s progress along its established trajectory, and Google&#8217;s build-and-buy response is merely exacerbating the problem.</p>
<p><strong>A disruptive innovation: the aggregation of vertical search</strong></p>
<p>If Google is suffering from &#8220;active inertia&#8221;, then what are its competitors doing? One potential source of disruptive competition could come in the form of the aggregation of vertical search. While the costs required to provide general web search are massive, the costs around aggregating a few thousand vertical search sites are comparatively tiny, especially if vertical search sites can be incentivised to conform to a certain standard (e.g. a semantic markup for vertical search results).</p>
<p>It is also not hard to envisage a competitive mechanism (e.g. real-time bidding) within an aggregator site which allows multiple vertical search sites to compete to provide a set of search results, ultimately driving up the quality of the aggregated search service in every niche. Such a platform would be hugely disruptive for Google&#8217;s existing search business.</p>
<p>Given the low costs to entry for an aggregator service such as this, we could see one being launched by a &#8220;tier two&#8221; general search provider such as Microsoft and Yahoo!, or even by a much smaller player with a strong pedigree in helping users find things online, such as <a href="http://uk.ask.com/" target="_blank">Ask Jeeves</a>.</p>
<p>If an aggregator like this is launched &#8211; and we can see no reason why one shouldn&#8217;t be &#8211; then its owner will have all of the benefits of accessing online shoppers at the time of strongest intention, but without all the costs associated with crawling, storing and indexing the whole Web. Think of it as a &#8220;curated web&#8221;: one where all the hassle of product discovery, selection and purchase is outsourced to enthusiasts who really &#8220;get&#8221; the niche that you are into.</p>
<p><strong><i>Interested in improving your existing vertical search site, or developing a new one? <a href="http://www.keplarllp.com/contact">Get in touch</a> to find out how Keplar can help with your vertical search strategy, product design and technical implementation.</i></strong></p>
]]></content:encoded>
			<wfw:commentRss>http://www.keplarllp.com/blog/2010/05/towards-a-curated-web-why-vertical-search-is-a-potential-google-killer/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Towards a curated web: affiliate marketing and the new shape of the customer purchase funnel</title>
		<link>http://www.keplarllp.com/blog/2010/05/towards-a-curated-web-affiliate-marketing-and-the-new-shape-of-the-customer-purchase-funnel</link>
		<comments>http://www.keplarllp.com/blog/2010/05/towards-a-curated-web-affiliate-marketing-and-the-new-shape-of-the-customer-purchase-funnel#comments</comments>
		<pubDate>Wed, 19 May 2010 17:04:27 +0000</pubDate>
		<dc:creator>Yali</dc:creator>
				<category><![CDATA[Search & aggregation]]></category>
		<category><![CDATA[affiliate]]></category>
		<category><![CDATA[E-commerce]]></category>
		<category><![CDATA[ecommerce]]></category>
		<category><![CDATA[funnel]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[marketing outsourcing]]></category>
		<category><![CDATA[price comparison]]></category>
		<category><![CDATA[vertical search]]></category>

		<guid isPermaLink="false">http://www.keplarllp.com/blog/?p=648</guid>
		<description><![CDATA[Anyone watching TV these days can&#8217;t help but notice the proliferation of advertisements for price comparison sites. What is less widely understood is what is driving this sudden growth, and how this is changing the customer purchase funnel (in other words, the way we select products and services to buy online). In our previous blog post [...]]]></description>
			<content:encoded><![CDATA[<p style="text-align: center;"><img class="aligncenter" src="/blog/wp-content/uploads/2010/05/termite-mound-and-elephant.jpg" alt="Giant termite mound and elephant" title="Giant termite mound and elephant" /></p>
<p>Anyone watching TV these days can&#8217;t help but notice the proliferation of advertisements for price comparison sites. What is less widely understood is what is driving this sudden growth, and how this is changing the customer purchase funnel (in other words, the way we select products and services to buy online).</p>
<p>In our <a href="/blog/2010/04/towards-a-curated-web-a-quick-primer-on-vertical-search" target="_self">previous blog post</a> in the Curated Web series, we looked at the various types of vertical search site, how they make money and where they get their search results from. We touched briefly on affiliate marketers, mentioning the affiliate networks which they belong to and the affiliate fees which they earn. In this post we dive much more deeply into affiliate marketing, to understand why retailers are spending increasing amounts on affiliate marketing, what impact these rapidly developing vertical search sites are having on buyer behaviour online, and what these changes mean for major Web companies, especially Google. But first, we will explain a little more about what affiliate marketing actually is.</p>
<p><span id="more-648"></span></p>
<p><strong>A quick primer on affiliate marketing</strong></p>
<p>Affiliate marketing is the Web&#8217;s second oldest profession. In a nutshell, retailers offer &#8220;affiliates&#8221; a fee (either a flat fee or a percentage) based on the number of customers that an affiliate sends them and the amount that those customers spend. Anyone who can somehow find prospective customers and direct them to a retailer&#8217;s site can be an affiliate, and there are legions of affiliates doing exactly this &#8211; building assorted price comparison sites, product comparison sites, review sites, voucher sites and directories. For the retailer, affiliate marketing offers them a way of outsourcing some of their marketing effort. For clever affiliates who are good at finding ways to attract and direct prospective customers, it offers a potentially lucrative revenue stream.</p>
<p>One way in which affiliates can make money is by setting up vertical search sites for particular products or services. A good vertical search site guides the individual into making a buying choice, by:</p>
<ol>
<li>Identifying the key factors that matter to that individual,</li>
<li>presenting him or her with the options available,</li>
<li>offering expert advice on the pros and cons of different options,</li>
<li>offering user opinions (via social media) from other people who&#8217;ve made that decision,</li>
<li>and then finally providing a straightforward interface so that the individual can make a buying decision and then go quickly on to make the purchase.
</ol>
<p>Two examples of sites that do this particularly well are <a href="http://www.moneysupermarket.com/credit-cards/" target="_blank">Money Supermarket</a>, especially for financial products (e.g. credit cards, mortgages) and <a href="http://www.simplifydigital.co.uk/digital-tv/" target="_blank">Simplify Digital</a>, for people looking to choose a TV / broadband deal. These are effective affiliate sites because they add value for prospective customers (by helping them make informed buying decisions) whilst adding value to retailers by providing them &#8220;pre-qualified&#8221; customers who&#8217;ve expressed an interest in their specific product / service. Popularity with prospective customers means that these sites attract a good volume of traffic, lowering the marketing cost they have to spend to attract users and increasing the profit they make (by effectively &#8220;selling on&#8221; the individual to a retailer for a higher fee).</p>
<p><strong>How affiliate marketing is drastically altering the customer purchase funnel online</strong></p>
<p>Affiliate sites, not least those with rich vertical search functionality, are having a significant impact on the way in which people make buying decisions online. The nature of this change is best summed up by a pair of &#8220;before and after&#8221; diagrams, looking at how the purchase funnel for buying (for example) home broadband has been altered by the advent of affiliate sites. The first diagram shows the cyclical set of general search querying and site browsing which consumers depended on before the advent of vertical search sites:</p>
<p style="text-align: center;"><img class="aligncenter" title="Buying process before vertical search" src="/blog/wp-content/uploads/2010/05/purchase-funnel-before-vertical-search.png" alt="Buying process before vertical search"  /></p>
<p>&nbsp;</p>
<p>In the &#8220;after&#8221; diagram below, we show how a vertical search site takes over from generalised search to provide the end-to-end purchase funnel for customers:</p>
<p style="text-align: center;"><img class="aligncenter" title="Buying process after vertical search" src="/blog/wp-content/uploads/2010/05/purchase-funnel-after-vertical-search.png" alt="Buying process after vertical search" /></p>
<p>From the diagrams above, two things are clear:</p>
<ol>
<li>A good vertical search site simplifies life for the online shopper</li>
<li>A good vertical search site diminishes the amount of time that shopper spends doing generalised search, and indeed the importance of those search results</li>
</ol>
<p><strong>The future growth outlook for vertical search sites is extremely positive</strong></p>
<p>The affiliate market continues to grow &#8211; retailers are allocating bigger portions of their marketing budgets to affiliate marketing, as it represents a very low risk way of growing sales (because they only have to pay when they make sales). At the same time, more and more affiliates are being attracted into the space by the growing number of people who are researching buying decisions online and / or going on to make those purchases online.</p>
<p>Vertical search sites are especially common in product categories where the value of a prospective customer is high &#8211; for example, mortgages, pay-TV, mobile phones; in these cases, retailers can afford to offer high acquisition fees to affiliates, which in turn attracts more affiliates to the space. But as competition heats up among affiliates for the high-value sales, we can expect to see more affiliates launching vertical search sites to cover the &#8220;long tail&#8221; of niche or low-value product categories, especially as the costs of setting up vertical search sites diminish.</p>
<p><strong>So what next for general search?</strong></p>
<p>To date, generalised search as offered by Google and others has been an extremely valuable service, because it offers advertisers (retailers) a way to reach prospective consumers looking for their specific products. Google&#8217;s AdWords service, where advertisers bid to reach users searching for particular terms, has long been that company&#8217;s cash cow.</p>
<p>With the growth in vertical search, however, advertisers are being offered the opportunity to buy prospective customers in places other than on Google search results. Furthermore, they may be able to better qualify those prospective customers, because a good vertical search site &#8220;walks&#8221; the customer through the buying decision funnel, and only passes them onto the retailer once they&#8217;ve made a decision, providing a potentially higher value source of website visitors than AdWords. On top of this, some of these vertical search sites are now disintermediating Google, appealing directly to shoppers via TV advertisements and other above-the-line promotions.</p>
<p>So whilst the growth in vertical search may be good for both consumers and retailers, it could well spell trouble for Google &#8211; and we will explore this further in the <a href="/blog/2010/05/towards-a-curated-web-affiliate-marketing-and-the-new-shape-of-the-customer-purchase-funnel" target="_self">next post</a> in this series.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.keplarllp.com/blog/2010/05/towards-a-curated-web-affiliate-marketing-and-the-new-shape-of-the-customer-purchase-funnel/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Towards a curated web: a quick primer on vertical search</title>
		<link>http://www.keplarllp.com/blog/2010/04/towards-a-curated-web-a-quick-primer-on-vertical-search</link>
		<comments>http://www.keplarllp.com/blog/2010/04/towards-a-curated-web-a-quick-primer-on-vertical-search#comments</comments>
		<pubDate>Thu, 01 Apr 2010 09:02:02 +0000</pubDate>
		<dc:creator>Alex</dc:creator>
				<category><![CDATA[Search & aggregation]]></category>
		<category><![CDATA[affiliate]]></category>
		<category><![CDATA[affiliate marketing]]></category>
		<category><![CDATA[cpa]]></category>
		<category><![CDATA[curation]]></category>
		<category><![CDATA[price comparison]]></category>
		<category><![CDATA[scraping]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[vertical search]]></category>

		<guid isPermaLink="false">http://www.keplarllp.com/blog/?p=490</guid>
		<description><![CDATA[As the Google competition issues rumble on, various commentators have been grappling with the concept of general versus vertical search. Here at Keplar we have some thoughts on the rise of vertical search and its implications for Google, but we are aware that many people out there don&#8217;t yet know what vertical search is, or [...]]]></description>
			<content:encoded><![CDATA[<p><img class="aligncenter" title="Vertical search for stamps" src="/blog/wp-content/uploads/2010/03/stamp-search.jpg" alt="" /></p>
<p>As the <a href="http://www.ft.com/cms/s/2/46018520-20da-11df-b920-00144feab49a.html" target="_blank">Google competition issues</a> rumble on, various commentators have been grappling with the concept of general versus vertical search. Here at Keplar we have some thoughts on the rise of vertical search and its implications for Google, but we are aware that many people out there don&#8217;t yet know what vertical search is, or why it&#8217;s important. In this blog post we aim to nail down exactly what vertical search is, so that we can comment more widely on the emerging trends in future posts.</p>
<p>To begin with a definition: unlike a general Web search engine like Google or Bing, a vertical search engine focuses on a specific segment of content. Users visit these vertical search sites to conduct a specialised search for a specific genre of content or category of product. The most commonly known vertical search engines are the price comparison sites, like Kayak or Moneysupermarket.com, where consumers can enter their specific requirements and find a holiday or an insurance deal or similar.</p>
<p><span id="more-490"></span></p>
<p>But vertical search is not just used to drive e-commerce sales. There are vertical search engines for images (Google Images), jobs (<a href="http://jobs.trovit.co.uk/" target="_blank">Trovit Jobs</a>) and dates (<a href="http://www.single-view.co.uk/" target="_blank">Single View</a>) &#8211; to paraphrase a certain ad, there&#8217;s a vertical search site for pretty much everything.</p>
<p>Many vertical search sites are extremely cash generative. These revenues come largely through affiliate fees &#8211; if a user clicks through from Confused.com to Virgin Money and buys travel insurance, then Confused.com makes a referral fee. However there are others which are run at a loss (like Google Images) or as a labour of love; there are also many which don&#8217;t make money from the search results but monetize their regular audience through advertising &#8211; e.g. <a href="http://www.scubadviser.com/" target="_blank">Scubadviser</a>.</p>
<p>So vertical search engines are monetizable, but where do they get the actual search results from? There are four different approaches to collecting this data:</p>
<ol>
<li> <strong>Pull: scraping</strong> &#8211; where a vertical search site builds its search index by &#8216;scraping&#8217; relevant web sites. Scraping is the process of automatically collecting Web information and turning it from unstructured, human-readable data into structured data &#8211; perfect for feeding into a vertical search engine. (We wrote a <a href="/blog/2010/01/better-competitive-intelligence-through-scraping-with-groovy" target="_blank">previous blog post</a> on this.)</li>
<li><strong>Pull: affiliate networks</strong> &#8211; where a site imports its search index wholesale from one or more affiliate networks such as TradeDoubler or Commission Junction. These networks hold large databases of products from the merchants they represent &#8211; for example the Lastminute.com account within Commission Junction currently has 91 holidays in its database. A vertical search site for holidays could add all of these products to its search index.</li>
<li><strong>Push: feeds</strong> &#8211; where a site builds its search index from merchant-submitted data feeds, typically in XML format. Good examples of this are Kelkoo and Google Product Search.</li>
<li><strong>Push-pull: manual</strong> &#8211; where a vertical search site manually builds up a list of links to relevant sources, through a mixture of techniques including general web searches, community contributions and face-to-face meetings.</li>
</ol>
<p>If we now understand what a vertical search site is, then how do these fit alongside general web search from Google or Bing? Are they complementary, or competitive? We&#8217;ll be discussing these questions and others in our <a href="/blog/2010/04/towards-a-curated-web-a-quick-primer-on-vertical-search" target="_self">next post</a> in the Curated Web series&#8230;</p>
<p><strong><i>This is the first post in our Curated Web series of posts. The <a href="/blog/2010/05/towards-a-curated-web-affiliate-marketing-and-the-new-shape-of-the-customer-purchase-funnel" target="_self">second post</a> looks at how affiliate marketers have re-shaped the purchase funnel online, while the <a href="/blog/2010/05/towards-a-curated-web-why-vertical-search-is-a-potential-google-killer" target="_self">third and final post</a> in the series looks at how these developments threaten Google.</i></strong></p>
]]></content:encoded>
			<wfw:commentRss>http://www.keplarllp.com/blog/2010/04/towards-a-curated-web-a-quick-primer-on-vertical-search/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Better competitive intelligence through scraping with Groovy</title>
		<link>http://www.keplarllp.com/blog/2010/01/better-competitive-intelligence-through-scraping-with-groovy</link>
		<comments>http://www.keplarllp.com/blog/2010/01/better-competitive-intelligence-through-scraping-with-groovy#comments</comments>
		<pubDate>Wed, 20 Jan 2010 08:37:20 +0000</pubDate>
		<dc:creator>Alex</dc:creator>
				<category><![CDATA[Coding]]></category>
		<category><![CDATA[Scaling business processes]]></category>
		<category><![CDATA[Search & aggregation]]></category>
		<category><![CDATA[automate]]></category>
		<category><![CDATA[crawl]]></category>
		<category><![CDATA[groovy]]></category>
		<category><![CDATA[monitor]]></category>
		<category><![CDATA[scrape]]></category>
		<category><![CDATA[spider]]></category>

		<guid isPermaLink="false">http://www.keplarllp.com/blog/?p=198</guid>
		<description><![CDATA[For the first of our series of technical posts I&#8217;m going to look at the poorly understood topic of web scraping. To start with a definition: web scraping is the process of automatically collecting Web information and turning it from unstructured, human-readable data into structured data that can be stored and analysed in a database [...]]]></description>
			<content:encoded><![CDATA[<p>For the first of our series of technical posts I&#8217;m going to look at the poorly understood topic of <a href="http://en.wikipedia.org/wiki/Web_scraping" target="_blank">web scraping</a>. To start with a definition: web scraping is the process of automatically collecting Web information and turning it from unstructured, human-readable data into structured data that can be stored and analysed in a database or spreadsheet. The most famous scraper of all is Google, who regularly scrape and index a huge proportion of the Web to feed into their search engine.</p>
<p>How is web scraping useful for a business that isn’t Google-sized?  Web scraping can be used to collect and structure competitor data, making it an incredibly powerful marketing intelligence tool.  Consider online retail: using a web scraper it is possible for a retailer to automatically survey the range of products offered by competitor sites and the price each product is offered at. Because web scrapers can be automated, they can be programmed to run regularly – so companies can analyse how sensitive their own sales volumes are not just as a function of the item price, but as a function of the prices competitors price them at. It is even possible to use the data from web scrapers to dynamically price items in an online shop so that they are always competitive. If you&#8217;re an online retailer, you are quite possibly already being regularly scraped by a competitor.</p>
<p>In this post, we provide an example of a simple web scraper built using <a href="http://groovy.codehaus.org/" target="_blank">Groovy</a>. We chose Groovy because it&#8217;s a powerful, agile scripting language which is great at navigating/analysing HTML. The target of our scraper is a <a href="http://labs.keplarllp.com/shop/" target="_blank">simple test &#8220;shop&#8221;</a> which we have setup in Keplar Labs. You are welcome to run this scraper against our test shop &#8211; please note that scraping other sites may be against their terms and conditions or even in some cases an offence. <strong>Please seek legal advice before running any scraper on someone else&#8217;s website.</strong></p>
<p>Without further ado, here is the Groovy code:</p>
<p><span id="more-198"></span></p>
<pre class="brush: groovy; title: ; notranslate">
/* Intelbot written by Alex Dean on 15 Jan 2010 with dependencies:
 *  - NekoHTML parser (latest stable), http://nekohtml.sourceforge.net/index.html
 *  - Xerces (2.0.0 or higher), http://www.apache.org/dist/xerces/j/
 */

// Define the pages which contain links to products - our &quot;seeds&quot; in crawl parlance.
def seeds = [&quot;http://labs.keplarllp.com/shop&quot;]

// Load the NekoHTML parser with Xerces - this lets us parse the HTML.
slurper = new XmlSlurper(new org.cyberneko.html.parsers.SAXParser())

// Now let's loop through each seed URL in turn.
seeds.each() {

	println &quot;Accessing seed URL ${it}&quot;
	def seedURL = new URL(it)

	seedURL.withReader { seedReader -&gt;
		def seedHTML = slurper.parse(seedReader)

		// Show the title of the seed page we're parsing.
		println &quot;Seed page title is ${seedHTML.depthFirst().grep{ it.name() == 'TITLE'}}&quot;

		// Now loop through and find all the product links on this page.
		// For our purposes, a product link is any A tag inside a box div on the page.
		def productLinks = seedHTML.depthFirst().grep{ it.name() == 'DIV' &amp;&amp; it.@class == 'box' }.collect { it.A.'@href'.toString() }
		productLinks.each {

			println &quot;  Accessing product URL ${it}&quot;
			def productURL = new URI(seedURL.toString()).resolve(it).toURL()

			productURL.withReader { productReader -&gt;
				def productHTML = slurper.parse(productReader)

				// Now display the product name.
				println &quot;    Name is ${productHTML.depthFirst().grep{ it.name() == 'H1'}}&quot;

				// Now display the product description.
				println &quot;    Description is ${productHTML.depthFirst().grep{ it.name() == 'P' &amp;&amp; it.@class == 'ProductDescription'}}&quot;

				// Now need to grab the product price.
				println &quot;    Price is ${productHTML.depthFirst().grep{ it.name() == 'P' &amp;&amp; it.@class == 'ProductPrice'}}&quot;
			}
		}
	}
}
</pre>
<p>The code should be fairly self-documenting: essentially there is a loop to process each &#8220;seed page&#8221; (we have only one such page), and then an inner loop to process each product page found on a given seed page.</p>
<p>When you run the script in the Groovy Console, you should see output like this:</p>
<pre class="brush: plain; light: true; title: ; notranslate">
Accessing seed URL http://labs.keplarllp.com/shop
Seed page title is [Keplar Labs]
  Accessing product URL http://labs.keplarllp.com/shop/product1.html
    Name is [Flight to Dubai]
    Description is [Economy class flight to Dubai with Emirates]
    Price is [£699.00]
  Accessing product URL http://labs.keplarllp.com/shop/product2.html
    Name is [Train to Paris]
    Description is [First class train ticket to Paris on Eurostar]
    Price is [£225.50]
  Accessing product URL http://labs.keplarllp.com/shop/product3.html
    Name is [Flight to New York]
    Description is [Economy class flight to New York on United]
    Price is [£375.00]
</pre>
<p>As you can see, the scraper has successfully identified three products linked from the seed page, then accessed each of these product pages and retrieved the key information about each product (its name, description and current price). A more sophisticated scraper would simply build on this functionality, for example adding error handling and potentially adding some caching.</p>
<p>Web scraping is an incredibly powerful tool that has already transformed the web (by enabling services like Google search) but we believe it has the power to transform regular businesses too &#8211; once they learn to employ it. It&#8217;s also cheap to perform, making it a cost effective tool. So do try out the above code on your own computer &#8211; and let me know how you get on in the comments.</p>
<p><strong><i>Interested in having a web scraper designed and built for your company? <a href="http://www.keplarllp.com/contact">Send us an email</a> to find out how Keplar can help.</i></strong></p>
]]></content:encoded>
			<wfw:commentRss>http://www.keplarllp.com/blog/2010/01/better-competitive-intelligence-through-scraping-with-groovy/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Estate agents (not local newspapers) should worry about Google&#8217;s entry into the UK property market</title>
		<link>http://www.keplarllp.com/blog/2009/12/estate-agents-not-local-newspapers-should-worry-about-googles-entry-into-the-uk-property-market</link>
		<comments>http://www.keplarllp.com/blog/2009/12/estate-agents-not-local-newspapers-should-worry-about-googles-entry-into-the-uk-property-market#comments</comments>
		<pubDate>Fri, 04 Dec 2009 10:09:31 +0000</pubDate>
		<dc:creator>Yali</dc:creator>
				<category><![CDATA[Search & aggregation]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[property]]></category>

		<guid isPermaLink="false">http://www.keplarllp.com/blog/?p=136</guid>
		<description><![CDATA[Yesterday&#8217;s Financial Times reported that Google intends to launch an online property portal in the UK, where estate agents will be able to advertise their properties for free. &#8220;Experts say this could pose a serious threat to existing property websites and local newspapers&#8221;, according to the FT. What this says about the future of search [...]]]></description>
			<content:encoded><![CDATA[<p><img class="aligncenter" title="Foxtons" src="http://www.keplarllp.com/blog/wp-content/uploads/2009/12/estate-agent.jpg" alt="" width="280" height="429" /></p>
<p>Yesterday&#8217;s <a href="http://www.ft.com/cms/s/0/6839fb26-df8b-11de-98ca-00144feab49a.html?nclick_check=1" target="_blank">Financial Times</a> reported that Google intends to launch an online property portal in the UK, where estate agents will be able to advertise their properties for free. <em>&#8220;Experts say this could pose a serious threat to existing property websites and local newspapers&#8221;, </em>according to the FT.</p>
<p><strong>What this says about the future of search</strong></p>
<p>The interesting thing about the announcement is not what it says about the future of the selling and renting property in the UK.  It&#8217;s about the importance of vertical search relative to the generalised search market Google already dominates.  By providing a specialised solution for people searching for property, Google is acknowledging  that a vertical-specific search approach has advantages over a generalised product, and that ultimately there may be much more money in specific verticals than in general search.  (Just think about the potential affiliate fees a property seller might be willing to pay to sell a half-million pound property.)  We will explore the growing value in vertical search in more detail in a future blog post.</p>
<p><strong>Long term, Google&#8217;s move is much more of a threat to estate agents than local newspapers and property sites</strong></p>
<p>As anyone who&#8217;s recently tried to buy or rent a property in the UK&#8217;s supply-constrained market will testify, the UK&#8217;s &#8220;big three&#8221; property aggregation sites (<a title="Rightmove" href="http://www.rightmove.co.uk/" target="_blank">Rightmove</a>, <a title="FindaProperty" href="http://www.findaproperty.com/" target="_blank">FindaProperty</a> and <a title="Globrix" href="http://www.globrix.com/" target="_blank">Globrix</a>) already have good-enough functionality &#8211; in fact they already do all of the things that have excited commentators about Google&#8217;s offering, for example showing available properties on a map.</p>
<p>The problem for anyone relying on these sites is not one of  functionality, but one of recency: the property market moves quickly, especially in these supply-constrained times, and the stock available on these aggregation sites is rarely up-to-date.  Remember that estate agents all have bricks-and-mortar outlets and a steady stream of new clients through the door &#8211; and they are also very effective at client retention.  The upshot of all this is that they have little incentive to keep their online sites up-to-date with the latest properties.  As a result, the properties that are advertised online &#8211; and still available &#8211; tend to be the runt of the litter.  You could say that estate agents use sites like Rightmove in the way that online publishers use remnant ad networks: for the crap inventory which they can&#8217;t sell directly.</p>
<p><strong>Towards a self-serve property market</strong></p>
<p>Now, there is no reason to think that estate agents will be any faster uploading properties to Google&#8217;s site than to Rightmove.  But, there is reason to think Google will do a better job at directing large numbers of interested web surfers to their site &#8211; no one is better at directing web traffic than Google.</p>
<p>And the other thing which Google excel at is opening up their platforms to smaller players, offering easy-to-use, competitively-priced self-serve interfaces.  And if I want to sell my property, wouldn&#8217;t I prefer to do it directly on Google, and pay 0.02% commission, instead of with an estate agent where I pay 2%, so that they can advertise it on Google for free?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.keplarllp.com/blog/2009/12/estate-agents-not-local-newspapers-should-worry-about-googles-entry-into-the-uk-property-market/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

