Using a GRIM method to find unlikely published results

Discovering which published studies may be incorrect or fraudulent takes some work and here is a newer tool: GRIM.

GRIM is the acronym for Granularity-Related Inconsistency of Means, a mathematical method that determines whether an average reported in a scientific paper is consistent with the reported sample size and number of items. Here’s a less-technical answer: GRIM is a B.S. detector. The method is based on the simple insight that only certain averages are possible given certain sets of numbers. So if a researcher reports an average that isn’t possible, given the relevant data, then that researcher either (a) made a mistake or (b) is making things up.

GRIM is the brainchild of Nick Brown and James Heathers, who published a paper last year in Social Psychological and Personality Science explaining the method. Using GRIM, they examined 260 psychology papers that appeared in well-regarded journals and found that, of the ones that provided enough necessary data to check, half contained at least one mathematical inconsistency. One in five had multiple inconsistencies. The majority of those, Brown points out, are “honest errors or slightly sloppy reporting.”…

After spotting the Wansink post, Anaya took the numbers in the papers and — to coin a verb — GRIMMED them. The program found that the four papers based on the Italian buffet data were shot through with impossible math. If GRIM was an actual machine, rather than a humble piece of code, its alarms would have been blaring. “This lights up like a Christmas tree,” Brown said after highlighting on his computer screen the errors Anaya had identified…

Anaya, along with Brown and Tim van der Zee, a graduate student at Leiden University, also in the Netherlands, wrote a paper pointing out the 150 or so GRIM inconsistencies in those four Italian-restaurant papers that Wansink co-authored. They found discrepancies between the papers, even though they’re obviously drawn from the same dataset, and discrepancies within the individual papers. It didn’t look good. They drafted the paper using Twitter direct messages and titled it, memorably, “Statistical heartburn: An attempt to digest four pizza publications from the Cornell Food and Brand Lab.”

I wonder how long it will be before journals employ such methods for submitted manuscripts. Imagine Turnitin for academic studies. Then, what would happen to authors if problems are found?

It also sounds like a program like this could make it easy to do mass analysis of published studies to help answer questions like how many findings are fraudulent.

Perhaps it is too easy to ask whether GRIM has been vetted by outside persons…

Just how many scientific studies are fraudulent?

I’m not sure whether these figures are high or low regarding how many scientific studies contain midconduct:

Although deception in science is rare, it’s probably more common than many people think. Surveys show that roughly 2 percent of researchers admit to behavior that would constitute misconduct—the big three sins are fabrication of data, fraud, and plagiarism (other forms can include many other actions, including failure to get ethics approval for studies that involve humans). And that’s just those who admit to it—a recent analysis found evidence of problematic figures and images in nearly 4 percent of studies with those graphics, a figure that had quadrupled since 2000.

Here is part of the abstract from the first study cited above (the 2% figure):

A pooled weighted average of 1.97% (N = 7, 95%CI: 0.86–4.45) of scientists admitted to have fabricated, falsified or modified data or results at least once –a serious form of misconduct by any standard– and up to 33.7% admitted other questionable research practices. In surveys asking about the behaviour of colleagues, admission rates were 14.12% (N = 12, 95% CI: 9.91–19.72) for falsification, and up to 72% for other questionable research practices. Meta-regression showed that self reports surveys, surveys using the words “falsification” or “fabrication”, and mailed surveys yielded lower percentages of misconduct. When these factors were controlled for, misconduct was reported more frequently by medical/pharmacological researchers than others.

Considering that these surveys ask sensitive questions and have other limitations, it appears likely that this is a conservative estimate of the true prevalence of scientific misconduct.

I hope some of the efforts by researchers to address this – through a variety of means – are successful.

Take a look at the rest of the article as well: just as individual scholars feel a lot of pressure to commit fraud, big schools have a lot of money on the line with certain researchers and may not want to admit possible issues.

The ongoing politics of the 2020 Census

The dicennial Census is not just a counting exercise; it is a political matter as this commentary suggests.

According to recent documents from the Census Bureau and the Government Accountability Office, the bureau plans to substantially cut back on door-to-door surveying and, instead, use the internet, the Post Office and other means to determine who is living where.

The bureau thinks the 2020 survey will cost $5.2 billion less than the last one (an estimate the GAO questions), but the accuracy could be called into question. There will also likely be worries about fraud because many of the conclusions will be drawn through “imputations” — educated guesses.

In fact, fraud could affect the House of Representatives elections for years to come if someone isn’t watching.

During a recent hearing before the House Oversight Committee, which maintained control over the Census Bureau after the Obama-Emanuel caper, a key technology officer for the 2020 decennial admitted that a fraud prevention system won’t be fully in place until just a few months before the polling starts.

If the Census Bureau – often led by sociologists and other social scientists who have expertise in collecting and analyzing data – is fraudulent because certain parties don’t like the result, what can be left alone?

Sampling and estimation alone does not have to be a problem. Just because the Census can’t reach everyone – and they have certainly tried at points – doesn’t mean that there is room for fraud. If done well, the estimates are made based on accurate samples – meaning they generally match the proportions of the total population – and responsible people reporting on this data will always note that there is not 100% certainty in the data.

“Real Housewives” character lives in McMansion only by fraud

A “Real Housewives of New Jersey” character lived in a McMansion and its accompanying lifestyle – but it was all a fraud:

On TV they live large — in a 10,000-square-foot McMansion full of garish baubles and expensive toys in an ode to the bad taste and excessive spending that has made “The Real Housewives of New Jersey” a Bravo hit.

It’s the lifestyle Joe and Teresa Giudice — who grew up together as working-class Italian-American kids — always hungered for but could never truly afford, sources said, even when they convinced themselves and everyone around them they could.

The Giudices’ shaky facade of massive personal wealth — increasingly fragile since a 2009 Chapter 7 bankruptcy filing — finally imploded in a spectacular way last week when they were hit with a 39-count criminal fraud indictment.

The federal charges range from allegations that the two conspired to forge W-2 forms, tax returns, pay stubs and other documents to trick banks into lending them money, to accusations of perjury and false statements in their bankruptcy proceedings.

This won’t do the reputation of McMansions any good. See the picture of the Giudice’s home about halfway through the news story: it looks like everything McMansion critics would hate including a large wrought-iron fence and gate, an elaborate front door, a roof that looks like a castle, and plenty of rooms. Yet, critics would like the symbolism: the home may have been impressive on the outside or looked good on TV but ultimately, it literally all a fraud.

So if and when they lose the home, who is going to buy it?

When scientific papers are redacted, how does it impede the progress of science?

An article about a recent controversial paper published in Nature includes a summary of how many scientific papers were redacted or the result of fraud since 1975:

In the meantime, the paper has been cited 11 times by other published papers building on the findings.

It may be impossible for anyone from outside to know the extent of the problems in the Nature paper. But the incident comes amid a phenomenon that some call a “retraction epidemic.”

Last year, research published in the Proceedings of the National Academy of Sciences found that the percentage of scientific articles retracted because of fraud had increased tenfold since 1975.

The same analysis reviewed more than 2,000 retracted biomedical papers and found that 67 percent of the retractions were attributable to misconduct, mainly fraud or suspected fraud.

“You have a lot of people who want to do the right thing, but they get in a position where their job is on the line or their funding will get cut, and they need to get a paper published,” said Ferric C. Fang, one of the authors of the analysis and a medical professor at the University of Washington. “Then they have this tempting thought: If only the data points would line up .?.?.?”

Fang said retractions may be rising because it is simply easier to cheat in an era of digital images, which can be easily manipulated. But he said the increase is caused at least in part by the growing competition for publication and for NIH grant money.

There are two consequences of this commonly discussed in the media. One is the price for taxpayers who fund some of the big money scientific and medical research through federal grants. Second is the credibility of science itself.

But, I think there is a third issue that is perhaps even more important. What does this say about what we actually know about the world? In other worlds, how many subsequent papers are built on the fraudulent or redacted work? Science often works in a chain or pyramid; later work builds on earlier findings, particularly ones published in more prestigious journals. So when a paper is questioned, like the piece in Nature, it isn’t just about the nature of that one paper. It is also about the 11 papers that have already cited it.

So what does this mean for what we actually know? How much does a redacted piece set back science? Or, do researchers hardly even notice? I suspect many of these redacted papers don’t slow down things too much but there is always the potential that a redacted paper could pull the rug out of important findings.

h/t Instapundit

Trying to ensure more accountability in US News & World Report college ranking data

The US News & World Report college rankings are big business but also a big headache in data collection. The company is looking into ways to ensure more trustworthy data:

A new report from The Washington Post‘s Nick Anderson explores the increasingly common problem, in which universities submit inflated standardized test scores and class rankings for members of their incoming classes to U.S. News, which doesn’t independently verify the information. Tulane University, Bucknell University, Claremont McKenna College, Emory University, and George Washington University have all been implicated in the past year alone. And those are just the schools that got caught:

A survey of 576 college admissions officers conducted by Gallup last summer for the online news outlet Inside Higher Ed found that 91 percent believe other colleges had falsely reported standardized test scores and other admissions data. A few said their own college had done so.

For such a trusted report, the U.S. News rankings don’t have many safeguards ensuring that their data is accurate. Schools self-report these statistics on the honor system, essentially. U.S. News editor Brian Kelly told Inside Higher Ed’s Scott Jaschik, “The integrity of data is important to everybody … I find it incredible to contemplate that institutions based on ethical behavior would be doing this.” But plenty of institutions are doing this, as we noted back in November 2012 when GWU was unranked after being caught submitting juiced stats. 

At this point, U.S. News shouldn’t be surprised by acknowledgment like those from Tulane and Bucknell. It turns out that if you let schools misreport the numbers — especially in a field of fierce academic competition and increasingly budgetary hardship — they’ll take you up on the offer. Kelly could’ve learned that by reading U.S. News‘ own blog, Morse Code. Written by data researcher Bob Morse, almost half of the recent posts have been about fraud. To keep schools more honest, the magazine is considering requiring university officials outside of enrollment offices to sign a statement vouching for submitted numbers. But still, no third party accountability would be in place, and many higher ed experts are already saying that the credibility of the U.S. News college rankings is shot.

Three quick thoughts:

1. With the amount of money involved in the entire process, this should not be a surprise. Colleges want to project the best image they can so having a weakly regulated system (and also a suspect methodology and set of factors to start with) can lead to abuses.

2. If the USNWR rankings can’t be trusted, isn’t there someone who could provide a more honest system? This sounds like an opportunity for someone.

3. I wonder if there are parallels to PED use in baseball. To some degree, it doesn’t matter if lots of schools are gaming the system as long as the perception among schools is that everyone else is doing it. With this perception, it is easier to justify one’s own cheating because colleges need to catch up or compete with each other.

Social psychologist on quest to find researchers who falsify data

The latest Atlantic magazine includes a short piece about a social psychologist who is out to catch other researchers who falsify data. Here is part of the story:

Simonsohn initially targeted not flagrant dishonesty, but loose methodology. In a paper called “False-Positive Psychology,” published in the prestigious journal Psychological Science, he and two colleagues—Leif Nelson, a professor at the University of California at Berkeley, and Wharton’s Joseph Simmons—showed that psychologists could all but guarantee an interesting research finding if they were creative enough with their statistics and procedures.

The three social psychologists set up a test experiment, then played by current academic methodologies and widely permissible statistical rules. By going on what amounted to a fishing expedition (that is, by recording many, many variables but reporting only the results that came out to their liking); by failing to establish in advance the number of human subjects in an experiment; and by analyzing the data as they went, so they could end the experiment when the results suited them, they produced a howler of a result, a truly absurd finding. They then ran a series of computer simulations using other experimental data to show that these methods could increase the odds of a false-positive result—a statistical fluke, basically—to nearly two-thirds.

Just as Simonsohn was thinking about how to follow up on the paper, he came across an article that seemed too good to be true. In it, Lawrence Sanna, a professor who’d recently moved from the University of North Carolina to the University of Michigan, claimed to have found that people with a physically high vantage point—a concert stage instead of an orchestra pit—feel and act more “pro-socially.” (He measured sociability partly by, of all things, someone’s willingness to force fellow research subjects to consume painfully spicy hot sauce.) The size of the effect Sanna reported was “out-of-this-world strong, gravity strong—just super-strong,” Simonsohn told me over Chinese food (heavy on the hot sauce) at a restaurant around the corner from his office. As he read the paper, something else struck him, too: the data didn’t seem to vary as widely as you’d expect real-world results to. Imagine a study that calculated male height: if the average man were 5-foot?10, you wouldn’t expect that in every group of male subjects, the average man would always be precisely 5-foot-10. Yet this was exactly the sort of unlikely pattern Simonsohn detected in Sanna’s data…

Simonsohn stressed that there’s a world of difference between data techniques that generate false positives, and fraud, but he said some academic psychologists have, until recently, been dangerously indifferent to both. Outright fraud is probably rare. Data manipulation is undoubtedly more common—and surely extends to other subjects dependent on statistical study, including biomedicine. Worse, sloppy statistics are “like steroids in baseball”: Throughout the affected fields, researchers who are too intellectually honest to use these tricks will publish less, and may perish. Meanwhile, the less fastidious flourish.

The current research may just provide incentives for researchers to cut corners and end up with false results. Publishing is incredibly important for the career of an academic and there is little systematic oversight of a researcher’s data. I’ve written before about ways that data could be made more open but it would take some work to put these ideas into practice.

What I wouldn’t want to happen is have people read a story like this and conclude that fields like social psychology have nothing to offer because who knows how many of the studies might be flawed. I also wonder about the vigilante edge to this story – it makes a journalistic piece to tell about a social psychologist who is battling his own field but this isn’t how science should work. Simonsohn should be joined by others who should also be concerned by these potential issues. Of course, there may not be many incentives to pursue this work as it might invite criticism from inside and outside the discipline.