Argument: still need thinking even with big data

Justin Fox argues that the rise of big data doesn’t mean we can abandon thinking about data and relationships between variables:

Big data, it has been said, is making science obsolete. No longer do we need theories of genetics or linguistics or sociology, Wired editor Chris Anderson wrote in a manifesto four years ago: “With enough data, the numbers speak for themselves.”…

There are echoes here of a centuries-old debate, unleashed in the 1600s by protoscientist Sir Francis Bacon, over whether deduction from first principles or induction from observed reality is the best way to get at truth. In the 1930s, philosopher Karl Popper proposed a synthesis, in which the only scientific approach was to formulate hypotheses (using deduction, induction, or both) that were falsifiable. That is, they generated predictions that — if they failed to pan out — disproved the hypothesis.

Actual scientific practice is more complicated than that. But the element of hypothesis/prediction remains important, not just to science but to the pursuit of knowledge in general. We humans are quite capable of coming up with stories to explain just about anything after the fact. It’s only by trying to come up with our stories beforehand, then testing them, that we can reliably learn the lessons of our experiences — and our data. In the big-data era, those hypotheses can often be bare-bones and fleeting, but they’re still always there, whether we acknowledge them or not.

“The numbers have no way of speaking for themselves,” political forecaster Nate Silver writes, in response to Chris Anderson, near the beginning of his wonderful new doorstopper of a book, The Signal and the Noise: Why So Many Predictions Fail — But Some Don’t. “We speak for them.”

These days, finding and examining data is much easier than before but it is still necessary to interpret what these numbers mean. Observing relationships between variables doesn’t necessarily tell us something valuable. We also want to know why variables are related and this is where hypotheses come in. Careful hypothesis testing means we can rule out spurious associations, other variables that may be leading to the observed relationship, and look for the influence of one variable on another when controlling for other factors (the essence of regression) or looking at more complex models where we can see how a variety of models affect each other at the same time.

Also, at the opposite end of the scientific process from the hypotheses, utilizing findings when creating and implementing policies will also require thinking. Once we have established that relationships likely exist, it takes even more work to respond to this in useful and effective ways.

Dealing with being wrong in science

A doctor who challenges the faulty research of his peers is profiled in the latest issue of Atlantic. His conclusion is that expectations about science, specifically reactions to being wrong, need to be changed:

We could solve much of the wrongness problem, Ioannidis says, if the world simply stopped expecting scientists to be right. That’s because being wrong in science is fine, and even necessary—as long as scientists recognize that they blew it, report their mistake openly instead of disguising it as a success, and then move on to the next thing, until they come up with the very occasional genuine breakthrough. But as long as careers remain contingent on producing a stream of research that’s dressed up to seem more right than it is, scientists will keep delivering exactly that.

Negative findings, typically meaning that an alternative hypothesis is rejected, tend to receive less attention. Yet they are still useful as they advance science by ruling out alternatives. Both positive and negative findings are needed to build science (and any of its disciplines in the natural or social sciences).

But this doctor also suggests that the incentive system for scientists needs to be changed. As long as breakthroughs and big findings are what are rewarded, that is what scientists will look for and claim to find.