See the bigger picture when reading media reports of new scientific findings

Think a new study touted by the media is too sensational? Take the long view of such reports, as sociologists and other researchers do:

One solution for reporters that hasn’t gotten a lot of attention yet, but should, is the value of talking to social scientists — historians of science and medicine, anthropologists, political scientists and sociologists of science– in the process of reporting about research. Experts in these disciplines who examine the practice of scientific and medical research from outside of it are in a great position to give reporters, and by extension their readers, insight into where new scientific knowledge came from, what sort of agenda might be motivating the people involved, the cultural meanings attached to particular scientific findings, what questions were being asked—and what questions weren’t asked, but should have been.

To see how, take a look at a story from earlier this week that nicely illustrates the value a social scientist can bring to how a science story is reported: Did you hear that hurricanes with feminine names are deadlier than ones with male names because people’s sexist bias causes them not to take female storms as seriously? As Ed Yong reported in National Geographic’s “Phenomena”, it’s probably not true. Yong talked to a social scientist who helped break down the reasons why – from weaknesses of the methods to the context of other factors already known to affect the deadliness of storms. Check out the reporting and ensuing discussion here…

“Almost every time one of these studies comes out, it’s promoted as evidence that ‘X single factor’ is a decisive culprit,” said Chloe Silverman, PhD, a sociologist and historian of science in Drexel’s Center for Science, Technology & Society, whose current project is focused on people’s approaches to understanding pollinator health. “But there’s plenty of evidence that a combination of factors contribute to honey bee health problems.”…

And journalists tend to follow particular narrative conventions, such as “the discovery just around the corner” or “the intractable mystery,” Silverman noted. “But social scientists who study science are in a better position than most to both identify those tendencies and offer more realistic descriptions of the pace and progress of scientific research.”

There is probably some irony here that Drexel’s media relations is pushing this point of view even as it is a helpful correction to the typical approach journalists take to the latest scientific findings. To be honest, it takes time in sociology and other fields to develop credible hypotheses, data, and theories. Researchers interact with other research to further their ideas and build upon the work that has already been done. Reaching consensus may take years or it may never completely happen.

I wonder how much social and natural scientists could do to better communicate the full scientific process. In a world that seems to be going faster, science still takes time.

Argument: businesses should use scientific method in studying big data

Sociologist Duncan Watts explains how businesses should go about analyzing big data:

A scientific mind-set takes as its inspiration the scientific method, which at its core is a recipe for learning about the world in a systematic, replicable way: start with some general question based on your experience; form a hypothesis that would resolve the puzzle and that also generates a testable prediction; gather data to test your prediction; and finally, evaluate your hypothesis relative to competing hypotheses.

The scientific method is largely responsible for the astonishing increase in our understanding of the natural world over the past few centuries. Yet it has been slow to enter the worlds of politics, business, policy, and marketing, where our prodigious intuition for human behavior can always generate explanations for why people do what they do or how to make them do something different. Because these explanations are so plausible, our natural tendency is to want to act on them without further ado. But if we have learned one thing from science, it is that the most plausible explanation is not necessarily correct. Adopting a scientific approach to decision making requires us to test our hypotheses with data.

While data is essential for scientific decision making, theory, intuition, and imagination remain important as well—to generate hypotheses in the first place, to devise creative tests of the hypotheses that we have, and to interpret the data that we collect. Data and theory, in other words, are the yin and yang of the scientific method—theory frames the right questions, while data answers the questions that have been asked. Emphasizing either at the expense of the other can lead to serious mistakes…

Even here, though, the scientific method is instructive, not for eliciting answers but rather for highlighting the limits of what can be known. We can’t help asking why Apple became so successful, or what caused the last financial crisis, or why “Gangnam Style” was the most viral video of all time. Nor can we stop ourselves from coming up with plausible answers. But in cases where we cannot test our hypothesis many times, the scientific method teaches us not to infer too much from any one outcome. Sometimes the only true answer is that we just do not know.

To summarize: the scientific method provides ways to ask questions and receive data regarding answering these questions. It is not perfect – it doesn’t always produce the answer or the answers people are looking for, it may only be as good as the questions asked, it requires a rigorous methodology – but it can help push forward the development of knowledge.

While there are businesses and policymakers using such approaches, it strikes me that such an argument for the scientific method is especially needed in the midst of big data and gobs of information. In today’s world, getting information is not a problem. Individuals and companies can quickly find or measure lots of data. However, it still requires work, interpretation, and proper methodology to interpret that data.

Journalists: stop saying scientists “proved” something in studies

One comment after a story about a new study on innovation in American films over time reminds journalists that scientists do not “prove” things in studies.

The front page title is “Scientist Proves…”

I’m willing to bet the scientist said no such thing. Rather it was probably more along the lines of “the data gives an indication that…”

Terms in science have pretty specific meanings that differ from our day-to-day usage. “Prove” and “theory, among others, are such terms. Indeed, science tends to avoid “prove” or “proof.” To quote another article “Proof, then, is solely the realm of logic and mathematics (and whiskey).”

[end pedantry]

To go further, using the language of proof/prove tends to relay a particular meaning to the public: the scientist has shown without a doubt and that in 100% of cases that a causal relationship exists. This is not how science, natural or social, works. We tend to say outcomes are more or less likely. There can also be relationships that are not causal – correlation without causation is a common example. Similarly, a relationship can still be true even if it doesn’t apply to all or even most cases. When teaching statistics and research methods, I try to remind my students of this. Early on, I suggest we are into “proving” things but rather looking for relationships between things using methods, quantitative or qualitative, that still have some measure of error built-in. If we can’t have 100% proof, that doesn’t mean science is dead – it just means that done correctly, we can be more confident about our observations.

See an earlier post regarding how Internet commentors often fall into similar traps when responding to scientific studies.

 

Internet commenters can’t handle science because they argue by anecdote, think studies apply to 100% of cases

Popular Science announced this week they are not allowing comments on their stories because “comments can be bad for science”:

But even a fractious minority wields enough power to skew a reader’s perception of a story, recent research suggests. In one study led by University of Wisconsin-Madison professor Dominique Brossard, 1,183 Americans read a fake blog post on nanotechnology and revealed in survey questions how they felt about the subject (are they wary of the benefits or supportive?). Then, through a randomly assigned condition, they read either epithet- and insult-laden comments (“If you don’t see the benefits of using nanotechnology in these kinds of products, you’re an idiot” ) or civil comments. The results, as Brossard and coauthor Dietram A. Scheufele wrote in a New York Times op-ed:

Uncivil comments not only polarized readers, but they often changed a participant’s interpretation of the news story itself.
In the civil group, those who initially did or did not support the technology — whom we identified with preliminary survey questions — continued to feel the same way after reading the comments. Those exposed to rude comments, however, ended up with a much more polarized understanding of the risks connected with the technology.
Simply including an ad hominem attack in a reader comment was enough to make study participants think the downside of the reported technology was greater than they’d previously thought.

Another, similarly designed study found that just firmly worded (but not uncivil) disagreements between commenters impacted readers’ perception of science…

A politically motivated, decades-long war on expertise has eroded the popular consensus on a wide variety of scientifically validated topics. Everything, from evolution to the origins of climate change, is mistakenly up for grabs again. Scientific certainty is just another thing for two people to “debate” on television. And because comments sections tend to be a grotesque reflection of the media culture surrounding them, the cynical work of undermining bedrock scientific doctrine is now being done beneath our own stories, within a website devoted to championing science.

In addition to rude comments and ad hominem attacks leading to changed perceptions about scientific findings, here are two common misunderstandings of how science works often found in online comments (these are also common misconceptions offline):

1. Internet conversations are ripe for argument by anecdote. This happens all the time: a study is described and then the comments are full of people saying that the study doesn’t apply to them or someone they know. Providing a single counterfactual usually says very little and scientific studies are often designed to be as generalizable as they can be. Think of jokes made about global warming: just because there is one blizzard or one cold season doesn’t necessarily invalidate a general trend upward for temperatures.

2. Argument by anecdote is related to a misconception about scientific studies: the findings do not often apply to 100% of cases. Scientific findings are probabilistic, meaning there is some room for error (this does not mean science doesn’t tell us anything – it means it is hard to measure and analyze the real world – and scientists try to limit error as much as possible). Thus, scientists tend to talk in terms of relationships being more or less likely. This tends to get lost in news stories that suggest 100% causal relationships.

In other words, in order to have online conversations about science, you have to have readers who know the basics of scientific studies. I’m not sure my two points above are necessarily taught before college but I know I cover these ideas in both Statistics and Research Methods courses.

Wired’s five tips for “p-hacking” your way to a positive study result

As part of its “Cheat Code to Life,” Wired includes four tips for researchers to obtain positive results in their studies:

Many a budding scientist has found themself one awesome result from tenure and unable to achieve that all-important statistical significance. Don’t let such setbacks deter you from a life of discovery. In a recent paper, Joseph Simmons, Leif Nelson, and Uri Simonsohn describe “p-hacking”—common tricks that researchers use to fish for positive results. Just promise us you’ll be more responsible when you’re a full professor. —MATTHEW HUTSON

Create Options. Let’s say you want to prove that listening to dubstep boosts IQ (aka the Skrillex effect). The key is to avoid predefining what exactly the study measures—then bury the failed attempts. So use two different IQ tests; if only one shows a pattern, toss the other.

Expand the Pool. Test 20 dubstep subjects and 20 control subjects. If the findings reach significance, publish. If not, run 10 more subjects in each group and give the stats another whirl. Those extra data points might randomly support the hypothesis.

Get Inessential. Measure an extraneous variable like gender. If there’s no pattern in the group at large, look for one in just men or women.

Run Three Groups. Have some people listen for zero hours, some for one, and some for 10. Now test for differences between groups A and B, B and C, and A and C. If all compar­isons show significance, great. If only one does, then forget about the existence of the p-value poopers.

Wait for the NSF Grant. Use all four of these fudges and, even if your theory is flat wrong, you’re more likely than not to confirm it—with the necessary 95 percent confidence.

This might be summed up as “things that are done but would never be explicitly taught in a research methods course.” Several quick thoughts:

1. This is a reminder of how important 95% significant is in the world of science. My students often ask why the cut-point is 95% – why do we accept 5% error and not 10% (which people sometimes “get away with” in some studies) or 1% (wouldn’t we be more sure of our results?).

2. Even if significance is important and scientists hack their way to more positive results, they can still have a humility about their findings. Reaching 95% significance still means there is a 5% chance of error. Problems arise when findings are countered or disproven but we should expect this to happen occasionally. Additionally, results can be statistically significant but have little substantive significance. All together, having a significant finding is not the end of the process for the scientist: it still needs to be interpreted and then tested again.

3. This is also tied to the pressure of needing to find positive results. In other words, publishing an academic study is more likely if you disprove the null hypothesis. At the same time, not disproving the hypothesis is still useful knowledge and such studies should also be published. Think of the example of Edison’s quest to find the proper material for a lightbulb filament. The story is often told in such a way to suggest that he went through a lot of work to finally find the right answer. But, this is often how science works: you go through a lot of ideas and data before the right answer emerges.

When scientific papers are redacted, how does it impede the progress of science?

An article about a recent controversial paper published in Nature includes a summary of how many scientific papers were redacted or the result of fraud since 1975:

In the meantime, the paper has been cited 11 times by other published papers building on the findings.

It may be impossible for anyone from outside to know the extent of the problems in the Nature paper. But the incident comes amid a phenomenon that some call a “retraction epidemic.”

Last year, research published in the Proceedings of the National Academy of Sciences found that the percentage of scientific articles retracted because of fraud had increased tenfold since 1975.

The same analysis reviewed more than 2,000 retracted biomedical papers and found that 67 percent of the retractions were attributable to misconduct, mainly fraud or suspected fraud.

“You have a lot of people who want to do the right thing, but they get in a position where their job is on the line or their funding will get cut, and they need to get a paper published,” said Ferric C. Fang, one of the authors of the analysis and a medical professor at the University of Washington. “Then they have this tempting thought: If only the data points would line up .?.?.?”

Fang said retractions may be rising because it is simply easier to cheat in an era of digital images, which can be easily manipulated. But he said the increase is caused at least in part by the growing competition for publication and for NIH grant money.

There are two consequences of this commonly discussed in the media. One is the price for taxpayers who fund some of the big money scientific and medical research through federal grants. Second is the credibility of science itself.

But, I think there is a third issue that is perhaps even more important. What does this say about what we actually know about the world? In other worlds, how many subsequent papers are built on the fraudulent or redacted work? Science often works in a chain or pyramid; later work builds on earlier findings, particularly ones published in more prestigious journals. So when a paper is questioned, like the piece in Nature, it isn’t just about the nature of that one paper. It is also about the 11 papers that have already cited it.

So what does this mean for what we actually know? How much does a redacted piece set back science? Or, do researchers hardly even notice? I suspect many of these redacted papers don’t slow down things too much but there is always the potential that a redacted paper could pull the rug out of important findings.

h/t Instapundit

Bill Gates: we can make progress with goals, data, and a feedback loop

Bill Gates argues in the Wall Street Journal that significant progress can be made around the world if organizations and residents participate in a particular process:

In the past year, I have been struck by how important measurement is to improving the human condition. You can achieve incredible progress if you set a clear goal and find a measure that will drive progress toward that goal—in a feedback loop similar to the one Mr. Rosen describes.

This may seem basic, but it is amazing how often it is not done and how hard it is to get right. Historically, foreign aid has been measured in terms of the total amount of money invested—and during the Cold War, by whether a country stayed on our side—but not by how well it performed in actually helping people. Closer to home, despite innovation in measuring teacher performance world-wide, more than 90% of educators in the U.S. still get zero feedback on how to improve.

An innovation—whether it’s a new vaccine or an improved seed—can’t have an impact unless it reaches the people who will benefit from it. We need innovations in measurement to find new, effective ways to deliver those tools and services to the clinics, family farms and classrooms that need them.

I’ve found many examples of how measurement is making a difference over the past year—from a school in Colorado to a health post in rural Ethiopia. Our foundation is supporting these efforts. But we and others need to do more. As budgets tighten for governments and foundations world-wide, we all need to take the lesson of the steam engine to heart and adapt it to solving the world’s biggest problems.

Gates doesn’t use this term but this sounds like a practical application of the scientific method. Instead of responding to a social problem by going out and trying to “do something,” the process should be more rigorous, involve setting goals, collecting good data, interpreting the data, and then adjusting the process from the beginning. This is related to other points about this process:

1. It is one thing to be able to collect data (and this is often its own complicated process) but it is another to know what to do with it once you have it. Compared to the past, data is relatively easy to obtain today but using it well is another matter.

2. Another broad issue in this kind of feedback loop is developing the measurements and what counts as “success.” Some of this is fairly easy; when Gates praises the UN Millennium Goals, reducing occurrences of disease or boosting incomes has face validity for getting at what matters. But, measuring teacher’s performances or what makes a quality college are a little trickier to define in the first place. Gates calls this developing goals but this could be a lengthy process in itself.

It is interesting that Gates mentions the need for such loops in colleges so that students “could know where they would get the most for their tuition money.” The Gates Foundation has put money into studying public schools and just a few weeks ago released some of their findings:

After a three-year, $45 million research project, the Bill and Melinda Gates Foundation believes it has some answers.

The most reliable way to evaluate teachers is to use a three-pronged approach built on student test scores, classroom observations by multiple reviewers and teacher evaluations from students themselves, the foundation found…

The findings released Tuesday involved an analysis of about 3,000 teachers and their students in Charlotte; Dallas; Denver; Memphis; New York; Pittsburgh; and Hillsborough County, Fla., which includes Tampa. Researchers were drawn from the Educational Testing Service and several universities, including Harvard, Stanford and the University of Virginia…

Researchers videotaped 3,000 participating teachers and experts analyzed their classroom performance. They also ranked the teachers using a statistical model known as value-added modeling, which calculates how much an educator has helped students learn based on their academic performance over time. And finally, the researchers surveyed the students, who turned out to be reliable judges of their teacher’s abilities, Kane said.

All this takes quite a few resources and time. For those interested in quick action, this is not the process to follow. Hopefully, however, the resources and time pay off with better solutions.