Today Siddharth Shah, PhD and Senior Business Analyst at Efficient Frontier will speak on the topic of "Managing Risk in SEM" at SMX Advanced in Seattle. This post will introduce the concept of managing risk in SEM that he will present, and show how you can use histograms to help you assess and manage risk.
Firstly, what does risk mean in relation to your PPC campaign? You have hundreds, thousands, tens of thousands, or in the case of some Efficient Frontier clients, millions of keywords on multiple search engines, with multiple ad copies, landing campaigns, and possibly targeted to various regions. Every day you are expecting a certain amount of clicks to drive traffic and conversions to your site. There can be high levels of variability in impression volume, user behavior, and competitive activity, but the success of your business depends on this traffic. Therefore gaining some level of predictability to what can be a highly volatile marketplace is imperative.
A histogram is a statistical visualization tool that shows the proportion of cases that fall into certain ranges. Unlike a bar chart, it is the area of the bar that defines the values, rather than just the vertical scale. To make a histogram in Microsoft Excel, you need to download the Analysis ToolPak. You then have to define the bins, or ranges, that your data groupings should fall into. For example, the histogram below shows that for this particular head keyword, there were 27 days out of 80 that the keyword received between 60 and 69 conversions. The bins used here were multiples of 10, but as you will see on the Wikipedia entry, the bins do note have to be equal values.
So how can a histogram help you manage risk? Siddharth created histograms of many keywords by number of days that the keywords received certain amounts of revenue (or conversions), and found that head term histograms, like the one above, followed a certain pattern that resembles the normal distribution. In contrast, tail term keywords followed no visible pattern of revenue generation, suggesting that head and tail keywords should be treated differently in your portfolios.
Why is the normal distribution important? The normal distribution, or Gaussian distribution, or bell curve, as some call it, has many important properties, the most important of which in this case is its predictability. The blue line on top of the histogram above represents the normal distribution for the same values in the histogram. The central limit theorem, a fundamental theory of probability, states that a large number of random variables will follow a normal distribution. Thus we can predict, with a reasonable amount of certainty, the range in amount of revenue or number of conversions we should expect from a head keyword.
This is where the mean and the standard deviation (SD) come in, as have been shown on the chart. The mean, or average, number of conversions that occurred from this keyword on any of the 80 days analyzed was 60.55. But if you expect 61 conversions every day from this keyword, you will be sorely disappointed, as the histogram shows that there were only 34 days (27/80) in which there were between 60 and 69 conversions. Even if you expect 60 or greater conversions, your goal will only be satisfied 79% of the time (63/80), which leaves 17 days that you will be wondering what went wrong.
In a normal distribution, a standard deviation of 1 standard unit above or below the average indicates 68% probability, and a deviation of 2 standard units indicates 95% probability. The standard deviation (a convenient formula in Excel: STDEV) for this group of data is 13.21. Thus there is a 68% chance that on any given day this keyword will bring in between 47 and 74 conversions (60.55 +/- 13.21). There is a 95% chance that the keyword will bring in between 34 and 87 conversions (60.55 +/- 2*13.21).
You can use this method to define a range of expected performance for your head keywords. If you are 95% confident that the keyword will bring in a specified range of conversions, you only have to worry about the days that performance fell outside of that range. Since you don't have to worry about the days where performance exceeded expectations (unless you want to investigate if there is a positive change in the marketplace), you now only need to worry about the days where performance did not meet expectations. In this case, there was only one day where the keyword received fewer than 34 conversions, which is far less taxing than worrying about performance not meeting expectations on say 17 days if you were holding this keyword to a 60 conversions per day goal.
This kind of management of the head keywords allows you to approach the tail with a different mindset. Siddharth did another analysis and found that in one month there were 16,105 keywords in a campaign that generated revenue, while in the following month there were 17,238 revenue generating keywords in the same campaign. Closer analysis showed that there were only 2,937 keywords, or 15% of total, that generated revenue in both months. Tail terms should closely monitored and actively bid on as their behavior is very dynamic and sparse data does not allow the same level of predictability found with head terms. This unpredictable behavior also means that tail terms should not be held to a daily revenue target if you want to extract the most value out of the tail.
Efficient Frontier's algorithmic optimization technology reduces risk by proactively monitoring and acting upon this dynamic keyword behavior in both the head and the tail, in a way that rules-based systems cannot. By modeling every keyword every day at every position for clicks, revenue and CPC, risk is reduced as the bids continually reflect activity in the dynamically changing marketplace.

Where are key word assist accounted for in this type of analysis? It is proven that generic keywords drive consumers down the purchase funnel and oftern result in head term clicks. Where do you account for value throughout the click path?
Posted by: John Bonham | June 18, 2008 at 01:46 PM
John,
Keyword assists were not included in this analysis. However, they could be accounted for using transitional probabilities and Markov chain analysis. The histogram method for risk assessment would not change.
-LeeAnn
Posted by: LeeAnn Prescott | June 18, 2008 at 03:08 PM