Most people woke up on Wednesday morning to some combination of shock, joy, bemusement, and/or mild terror as Donald Trump unexpectedly won the presidency.
I say “unexpectedly” because pundits and pollsters have been projecting a Clinton win for nearly 6 months.
– Source: FiveThirtyEight
And it’s not just that they’ve been projecting a win. They had been using elaborate, sophisticated statistical models to make their predictions.
This has easily been the most technologically sophisticated, data-driven election in the history of the world.
The most data-driven election in history
The 2016 election has been the most data-driven election in history.
Quietly and behind the scenes, data science has reshaped political campaigning and strategy. Going back to 2008 and 2012, the Obama campaign harnessed big data and analytics to understand voters, sharpen messaging, drive fundraising, and ultimately win votes.
But 2016 has taken things to a whole new level. In this election cycle, data science was no longer relegated to the back end.
Instead, data science has been front-and-center in the reporting of the election. Media outlets like Five Thirty Eight and The New York Times built stunning interactive dashboards to detail the state of the campaign. (Setting aside the fact that they got it wrong, if you want excellent examples of data visualization in action, look no further than the data-driven election coverage at FiveThirtyEight and the NYT.)
Moreover, they also built elaborate predictive models based on vast amounts of data.
But in spite of the fact that they used mountains of data and sophisticated statistical techniques, nearly all of the “big names” in data journalism got it wrong. Nearly all of the major predictions predicted a Clinton win.
This was the most data-driven election in history, and essentially, data failed.
It’s clear at this point that (setting aside a few contrarian predictions) data failed.
This is already triggering debate among data scientists and data journalists. And moreover, the use of data in this election will likely be the subject of scrutiny and discussion for years to come.
With that in mind, I’ll want to offer a few very high-level observations (obviously, with the advantage of 20/20 hindsight).
Beware false precision
On reflection, now that the election has been decided, I’m immediately reminded of a famous Warren Buffet quote:
We believe that if you can pinpoint it, you’re kidding yourself. Therefore, we think that when we make a decision, there ought to be a margin of safety.
Buffett was talking about investment decisions, but in much more general terms, it’s really a statement about judgement.
What Buffet is warning against is false precision.
In the post-election bewilderment, some commenters are suggesting that many of the model-builders “oversold” the precision of these models.
Garbage in, garbage out
Speaking roughly, a model is only as good as the data on which it’s built.
If you build a model on data that contains systematic errors (or is otherwise “bad data”) then the predictions of that model will be inaccurate.
This is sometimes know in computing circles as “garbage in, garbage out.”
While a quick, one-off blog post is no place to dissect the statistical methods of polling gurus, it seems obvious that we need to call the 2016 polling data into question. In fact, this election result is already leading to some introspection on the part of polling professionals. Essentially, they’re asking “how did we get this wrong.”
The inaccuracy of the polling data itself is obviously important, because many of the models in question (specifically, the super-model at Five Thirty Eight) relied on the data generated by pollsters.
If the data in these polls contained systematic errors, then the models themselves will make erroneous predictions.
Human intelligence and intuition still matter
As machine learning, data science, and technology progress, there’s an emerging sense that humans are becoming obsolete.
Experts are starting to warn that software and data-driven technology are driving structural changes in the economy that will, in many industries, devalue human labor and judgement. In some industries, technology might drive high unemployment.
Having said that, make no mistake: humans are not obsolete. Human intelligence and judgement still matter.
There were several people in this election cycle that called the election for Trump, and essentially nailed the reasons why. These people essentially looked beyond polling data and understood the deep wants, needs, and anxieties that would shape people’s behavior. In other words, a bit of human empathy drove clearer understanding of outcomes.
Others, like Nassim Taleb, strongly criticized election predictions on mathematical grounds.
In either case, human knowledge and understanding enabled better understanding.
Keep this in mind. Even as technology, analytics, and machine learning progress, we’re still likely to see a place for humans in overruling, criticizing, and tuning the technical methods being used.
All models are wrong, but some are useful
Data science is a powerful tool for understanding the world.
It is extremely valuable today, and will continue to grow in value over the next couple of decades.
But, data science is not infallible.
All models are wrong, but some are useful.
– George Box
Data science can get it wrong. Data – and specifically, erroneous data – can fail to give us clear insight and cause us to misunderstand reality.
As the world becomes increasingly data-driven, we would be wise to remember this.