Archive for May, 2013

Statistics, Damn Statistics, and Lies: The Elusive Nature of Analytics

May 30, 2013 3 comments

The hottest buzzword of the decade is “analytics” – the ability to gain heretofore unattainable intelligence and insight by mining piles of data. As with the Internet in the 90’s, what was once the sole domain of geekdom is becoming mainstream. Whereas once we employed a lone statistician somewhere in a broom closet to “do the numbers” for us, now we have our own dashboards at our disposal to tell us our business status up to the second. You can see the seductiveness of this promise, can’t you? Imagine the control you could exert with perfect intelligence. There was a story in Charles Duhigg‘s The Power of Habit about a coupon customizing system that was so insightful, it inadvertently revealed to the family of a 17-year-old girl that she was pregnant before she had a chance to break the news herself. Imagine what you could do with that insight…

The increased quantification of our lives and our businesses is going to bring us many benefits. It brings us intelligence, insight, better forecasts, and proper understanding of the effects of randomness. Services will be more able to customize their offerings to suit our needs. But they have a dangerous mystery about them, and we are adopting them faster than we are learning to understand their implications.

Analytics and statistics, as well as being the bearers of social insight, are also the instruments of institutional dysfunction. They are the primary refuge of those looking to justify their own incumbency and promotions. How often have you heard one political party tout their effect on, say, levels of unemployment? And then how often to you hear the other party say, “Well, those figure don’t take into account those workers who have stopped looking for jobs.”? And then how often do you hear the first party say, “Well, these figures have been the basis of our economic appraisals for decades.”? Not only can people not agree on the interpretation of a certain statistic, but they can’t even agree on which statistics and assumptions are legitimate.

There is already much made of the ethical use of statistics. The HBO drama The Wire coined the outstanding phrase “juking the stats” – tweaking the underlying observations so that the statistics tell the story you want to tell. Statistics-playing strategies occupy a moral gray area: for example, teachers who teach test questions to their students so that their school won’t lose funding under the No Child Left Behind program. Any time you distribute resources using quantitative measures, people have infinite incentive to figure out how to work the system.

What I’m talking about, however, comes before any ethical questions are asked. Analytics have a deceptive certainty to them; a way of appealing to our own cognitive biases while at the same time seeming concrete. People tend to believe that statistics denote certainty. They generally have no idea how much subjectivity they introduce the moment they interpret a chart or graph. “Of course this is what this means…it’s obvious!” Increasing the amount of analytics does not provide more certainty; as the volume of analyzed data increases, so too will the opportunities for cognitive bias increase. This is especially true if statistical understanding does not increase at the same rate as one’s daily involvement with data analysis…we’ll increasingly be called upon to give interpretations that seem certain to us, but that we’re ultimately untrained to give.

Here are a few principles of human reasoning (in no particular order) which may cause a problem: Read more…