A couple months ago I was at an American Marketing Association event on analytics. One of the sponsors was demonstrating the powerful capabilities of their prediction software in a creative way: the set up a fortune-telling booth. I gave them my name and zip code, and the fortune-teller accurately predicted, amongst other things, that I had just bought or was about to buy a new car.
It’s stunning the extent to which our behavior as consumers is utterly predictable, and many marketing companies and retailers are becoming much more efficient in grouping us not only by who we are and what we like, but by what we’re about to do. This is how they do it.
By now, everybody’s heard the term “data-mining” and its kissin’ cousin “predictive analytics” being thrown around. But they still have a black-box quality to them. I give you my customer transaction data, you run it through your “predictive analytics”, say the magic words, and the computer spits out exactly how to manipulate him or her into spending twice the money he was originally going to spend. Right? Well, not quite. So it might not be a bad idea, without getting too bogged down in the math, to demystify this process a bit and pull back the curtain on mathematical mind-reading.
Let’s start with simple business dashboards. Most people are familiar with the concept because they’ve been around for a while. Business Intelligence software like Tableau, Qlikview, Hyperion and Cubeware do nothing but extract data from a company’s transactional systems, and pull it together into neat little displays of trend lines and bar charts. It’s very easy to see the value in knowing (in real time) exactly how your company’s sales are trending, whether its operations are optimized, where any bottlenecks might be, etc. This is the level of inquiry at which most operations run: nothing more sophisticated than summing and counting up units, and displaying those as simple charts.
But a surface-level look at data only leads to surface-level understanding. Thomas Redman, in his blog article for the Harvard Business Review, tells us that we need to dig deeper and think more analytically. For example, a simple plot can tell you if two things are correlated, but it takes judgement to determine which thing actually caused the other. Or there could be a missing third variable causing both.
Here’s a more specific example: you’re managing a marketing campaign on Google, and you’re doing A/B testing with ads. The first ad gets you 50 clicks in a month. You try the second ad the next month, and you get 55 clicks. A simple trend line would seem to indicate that the second ad is slight better. But were those five extra clicks caused by the quality of the ad, or did they happen randomly?
This is where drawing conclusions about data gets more interesting. We move from simple aggregation of data into data-mining. Data-mining means using statistical techniques to find insights that couldn’t otherwise be seen with simple sums or counts. It moves from the realm of a straightforward engineering project with simple, correct answers into the grey area of data scientists, complex mixing of factors, and stronger or weaker conclusions.
Data-mining, at its core, is about grouping things more effectively using statistical modeling. It is useful across all functions of a company, but there is specific interest in it from marketers. Among other applications, data-mining can be used to group or segment customers so that they can be more effectively targeted. Here is a quote from The Modeling Agency on what data mining actually means:
…the process of developing mathematical models that identify groups of individuals who display [a] behavior of interest at differing rates. This group identification allows us to discriminate in the allocation of our resources more effectively. In short, we are looking for a better way to break our relationships into groups so that we can allocate more resources to the groups that benefit us, and minimize the resources to those groups that have a negative impact our specific performance metrics.
Despite the fact that the means get more complex and nebulous at this point, the fact remains that our consumptive behavior is still heavily predictable. Loyalty Builders is a remarketing and loyalty program marketing services company that makes heavy use of analytics. In a newspaper interview, CEO Mark Klein talks about using customer transaction data to predict which customers are going to make near-term repeat purchases, and which other customers are going to defect.
Klein makes use of predictive analytics, which are a subset of statistical data-mining techniques that speak to future events instead of past events. For example, what if I notice that your purchases from my company are becoming less frequent? My data shows me that customers who slow their purchases like I do are likely to defect. So when you call in, I’m going to make sure the person talking to you knows that this is a risk with you, and acts to prevent that defection.
Let me give you a quick idea of how this is actually done. One technique used very frequently in this field is called scoring. Let’s say that I own a bike company and I want to predict whether you will buy a bike. I have a bunch of data from past customers, and that includes variables like marital status, family status, commute distance, geographic region, etc. I want to find out which of those factors will help me predict your purchase, and how strong each individual factor is relative to one another.
I use a statistical technique called Logistic Regression, and assign a point value to each variable choice based on the strength of its predictive power. A commute distance under 2 miles might be worth 75 points, for example, but a commute distance of over ten miles might only be worth 10 points. Then I add up your total score, so a married male from the pacific northwest with no kids and a one mile commute might have a score of 380 out of 400 points, indicating that he might be a very likely purchaser. But a single female from the midwest with three kids and a 10 mile commute would only add up to a score of 150 out of 400 points, indicating that a purchase is less likely. I then concentrate my resources on converting the married male into a customer. This technique allows me to group people by a very complex mix of factors, rather than relying on traditional demographic demarcations.
The article from The Modeling Agency goes on to warn us about the potential pitfalls of using techniques like these. Because they are so powerful and yet not well understood by business stakeholders, they’re spoken about with a magical quality. It’s tempting to get some data-mining software, pop numbers into it like a calculator, and then build assumptions on top of the output without really even understanding it.
It’s important to remember that these techniques do represent a grey area in between right answers and wrong answers; getting closer to accuracy depends on the quality and amount of data as well as the strategic thinking of the data scientists who set up these analytics. They are only effective in the context of a strategically aligned analytics project with clear questions and goals. Models have to be optimized and re-optimized, and might not even be effective until more or different data is collected.
We have only seen the tip of the iceberg as to the predictive power and financial windfall involved in predictive analytics. Therefore whether you’re a business owner, marketer, department head, or other stakeholder, it’s important to get past the intimidating, black-box mystique surrounding analytics-based behavior prediction and start becoming educated. This knowledge represents the future of doing business, and our future market leaders will all have mastered these techniques.