Wired’s five tips for “p-hacking” your way to a positive study result

As part of its “Cheat Code to Life,” Wired includes four tips for researchers to obtain positive results in their studies:

Many a budding scientist has found themself one awesome result from tenure and unable to achieve that all-important statistical significance. Don’t let such setbacks deter you from a life of discovery. In a recent paper, Joseph Simmons, Leif Nelson, and Uri Simonsohn describe “p-hacking”—common tricks that researchers use to fish for positive results. Just promise us you’ll be more responsible when you’re a full professor. —MATTHEW HUTSON

Create Options. Let’s say you want to prove that listening to dubstep boosts IQ (aka the Skrillex effect). The key is to avoid predefining what exactly the study measures—then bury the failed attempts. So use two different IQ tests; if only one shows a pattern, toss the other.

Expand the Pool. Test 20 dubstep subjects and 20 control subjects. If the findings reach significance, publish. If not, run 10 more subjects in each group and give the stats another whirl. Those extra data points might randomly support the hypothesis.

Get Inessential. Measure an extraneous variable like gender. If there’s no pattern in the group at large, look for one in just men or women.

Run Three Groups. Have some people listen for zero hours, some for one, and some for 10. Now test for differences between groups A and B, B and C, and A and C. If all compar­isons show significance, great. If only one does, then forget about the existence of the p-value poopers.

Wait for the NSF Grant. Use all four of these fudges and, even if your theory is flat wrong, you’re more likely than not to confirm it—with the necessary 95 percent confidence.

This might be summed up as “things that are done but would never be explicitly taught in a research methods course.” Several quick thoughts:

1. This is a reminder of how important 95% significant is in the world of science. My students often ask why the cut-point is 95% – why do we accept 5% error and not 10% (which people sometimes “get away with” in some studies) or 1% (wouldn’t we be more sure of our results?).

2. Even if significance is important and scientists hack their way to more positive results, they can still have a humility about their findings. Reaching 95% significance still means there is a 5% chance of error. Problems arise when findings are countered or disproven but we should expect this to happen occasionally. Additionally, results can be statistically significant but have little substantive significance. All together, having a significant finding is not the end of the process for the scientist: it still needs to be interpreted and then tested again.

3. This is also tied to the pressure of needing to find positive results. In other words, publishing an academic study is more likely if you disprove the null hypothesis. At the same time, not disproving the hypothesis is still useful knowledge and such studies should also be published. Think of the example of Edison’s quest to find the proper material for a lightbulb filament. The story is often told in such a way to suggest that he went through a lot of work to finally find the right answer. But, this is often how science works: you go through a lot of ideas and data before the right answer emerges.

Using a list of “sleep-deprived professions to illustrate statistical and substantive significance”

I ran into a list of “sleep-deprived jobs” yesterday and I think it is a useful tool for illustrating what significance means. The top five sleep deprived jobs (starting with the least rested): home health aide (6 hours, 57 minutes), lawyer, police officers, physicians/paramedics, and economists. The top five jobs with the most sleep (starting with the most rested): forest/logging workers (7 hours, 22 minutes), hairstylists, sales representatives, bartenders, and construction workers. Here is where the data from the list came from:

The lists are based on interviews with 27,157 adults as part of the annual National Health Interview Survey, conducted by a division of the Centers for Disease Control and Prevention. Sleepy’s says its rankings were based on two variables: 1) average hours of sleep that respondents said they got in a 24-hour period, and 2) respondents’ occupations, as they would be classified by the Department of Labor.

Let’s talk about significance. First, statistical significance. The lower value is 6 hours and 57 minutes and the highest value is 7 hours, 22 minutes. We would need to know how the data is clustered, meaning does it look like a normal distribution (meaning most jobs are clumped in the middle) or it is a broader distribution? With a standard deviation, we could figure out how far these highest and lowest values are from the mean and whether they are outside 95% of all the cases.

Perhaps more interesting in this case is the second aspect of significance: even if a case is significantly different from the other cases, is this a meaningful difference in the real world? Just looking at the ten occupations at the top and bottom of this list, the top and bottom are separated by 35 minutes. Would roughly a half hour of sleep really change the quality of life or health between home health aides and forest/logging workers? Of course, sleep might not be the only factor that matters here but is this a meaningful difference? The Mayo Clinic recommends 7-9 hours a night for adults, the National Sleep Foundation also says 7-9 hours a night, and both agree that there are a lot of other factors involved. On the whole then, it appears that the average American (who is in an occupation) is on the low end of recommended sleep (a recurring theme in news stories over the years).

It appears that this list isn’t that helpful if everyone is relatively clustered together. But if we had a little more information, we could know more and determine whether there are (statistically and substantively) significant occupations.

Possible Fermilab “breakthough” illustrates statistical significance

Scientists at Fermilab may be on the verge of a scientific breakthrough regarding “a new elementary particle or a new fundamental force of nature.” There is just one problem:

But scientists on the Fermilab team say there is about a 1 in 1,000 chance that the results are a statistical fluke — odds far too high for them to claim a discovery.

“That’s no more than what physicists tend to call an ‘observation’ or an ‘indication,’ ” said Caltech physicist Harvey Newman.

For the finding to be considered real, researchers have to reduce the chances of a statistical fluke to about 1 in a million.

One of the key concepts in a statistics or social research course is statistical significance, where researchers say that they are 95% certain (or more) that their result is not just the result due to their sample or chance but that it actually reflects the population or reality. These scientists at Fermilab then want to be really sure that the results reflect reality as they want to reduce their possible error to 1 in a million.

Beyond working with the calculations, the scientists are also hoping to replicate their findings and rule out other explanations for what they are seeing:

Researchers hope that more data compiled at Fermilab will shed light on the matter, or that the Large Hadron Collider in Geneva will be able to replicate the findings. “We will know this summer when we double the data sets and see if it is still there,” said physicist Rob Roser of Fermilab, who is a spokesman for the project…

What the team must to do now, Roser said, is “eliminate all the mundane explanations.” They have been working on that, he said, and decided it was time to go public and let others know what they had found so far.

And science rolls on.