Using a GRIM method to find unlikely published results

Discovering which published studies may be incorrect or fraudulent takes some work and here is a newer tool: GRIM.

GRIM is the acronym for Granularity-Related Inconsistency of Means, a mathematical method that determines whether an average reported in a scientific paper is consistent with the reported sample size and number of items. Here’s a less-technical answer: GRIM is a B.S. detector. The method is based on the simple insight that only certain averages are possible given certain sets of numbers. So if a researcher reports an average that isn’t possible, given the relevant data, then that researcher either (a) made a mistake or (b) is making things up.

GRIM is the brainchild of Nick Brown and James Heathers, who published a paper last year in Social Psychological and Personality Science explaining the method. Using GRIM, they examined 260 psychology papers that appeared in well-regarded journals and found that, of the ones that provided enough necessary data to check, half contained at least one mathematical inconsistency. One in five had multiple inconsistencies. The majority of those, Brown points out, are “honest errors or slightly sloppy reporting.”…

After spotting the Wansink post, Anaya took the numbers in the papers and — to coin a verb — GRIMMED them. The program found that the four papers based on the Italian buffet data were shot through with impossible math. If GRIM was an actual machine, rather than a humble piece of code, its alarms would have been blaring. “This lights up like a Christmas tree,” Brown said after highlighting on his computer screen the errors Anaya had identified…

Anaya, along with Brown and Tim van der Zee, a graduate student at Leiden University, also in the Netherlands, wrote a paper pointing out the 150 or so GRIM inconsistencies in those four Italian-restaurant papers that Wansink co-authored. They found discrepancies between the papers, even though they’re obviously drawn from the same dataset, and discrepancies within the individual papers. It didn’t look good. They drafted the paper using Twitter direct messages and titled it, memorably, “Statistical heartburn: An attempt to digest four pizza publications from the Cornell Food and Brand Lab.”

I wonder how long it will be before journals employ such methods for submitted manuscripts. Imagine Turnitin for academic studies. Then, what would happen to authors if problems are found?

It also sounds like a program like this could make it easy to do mass analysis of published studies to help answer questions like how many findings are fraudulent.

Perhaps it is too easy to ask whether GRIM has been vetted by outside persons…

Just how many scientific studies are fraudulent?

I’m not sure whether these figures are high or low regarding how many scientific studies contain midconduct:

Although deception in science is rare, it’s probably more common than many people think. Surveys show that roughly 2 percent of researchers admit to behavior that would constitute misconduct—the big three sins are fabrication of data, fraud, and plagiarism (other forms can include many other actions, including failure to get ethics approval for studies that involve humans). And that’s just those who admit to it—a recent analysis found evidence of problematic figures and images in nearly 4 percent of studies with those graphics, a figure that had quadrupled since 2000.

Here is part of the abstract from the first study cited above (the 2% figure):

A pooled weighted average of 1.97% (N = 7, 95%CI: 0.86–4.45) of scientists admitted to have fabricated, falsified or modified data or results at least once –a serious form of misconduct by any standard– and up to 33.7% admitted other questionable research practices. In surveys asking about the behaviour of colleagues, admission rates were 14.12% (N = 12, 95% CI: 9.91–19.72) for falsification, and up to 72% for other questionable research practices. Meta-regression showed that self reports surveys, surveys using the words “falsification” or “fabrication”, and mailed surveys yielded lower percentages of misconduct. When these factors were controlled for, misconduct was reported more frequently by medical/pharmacological researchers than others.

Considering that these surveys ask sensitive questions and have other limitations, it appears likely that this is a conservative estimate of the true prevalence of scientific misconduct.

I hope some of the efforts by researchers to address this – through a variety of means – are successful.

Take a look at the rest of the article as well: just as individual scholars feel a lot of pressure to commit fraud, big schools have a lot of money on the line with certain researchers and may not want to admit possible issues.

The ongoing politics of the 2020 Census

The dicennial Census is not just a counting exercise; it is a political matter as this commentary suggests.

According to recent documents from the Census Bureau and the Government Accountability Office, the bureau plans to substantially cut back on door-to-door surveying and, instead, use the internet, the Post Office and other means to determine who is living where.

The bureau thinks the 2020 survey will cost $5.2 billion less than the last one (an estimate the GAO questions), but the accuracy could be called into question. There will also likely be worries about fraud because many of the conclusions will be drawn through “imputations” — educated guesses.

In fact, fraud could affect the House of Representatives elections for years to come if someone isn’t watching.

During a recent hearing before the House Oversight Committee, which maintained control over the Census Bureau after the Obama-Emanuel caper, a key technology officer for the 2020 decennial admitted that a fraud prevention system won’t be fully in place until just a few months before the polling starts.

If the Census Bureau – often led by sociologists and other social scientists who have expertise in collecting and analyzing data – is fraudulent because certain parties don’t like the result, what can be left alone?

Sampling and estimation alone does not have to be a problem. Just because the Census can’t reach everyone – and they have certainly tried at points – doesn’t mean that there is room for fraud. If done well, the estimates are made based on accurate samples – meaning they generally match the proportions of the total population – and responsible people reporting on this data will always note that there is not 100% certainty in the data.

“Real Housewives” character lives in McMansion only by fraud

A “Real Housewives of New Jersey” character lived in a McMansion and its accompanying lifestyle – but it was all a fraud:

On TV they live large — in a 10,000-square-foot McMansion full of garish baubles and expensive toys in an ode to the bad taste and excessive spending that has made “The Real Housewives of New Jersey” a Bravo hit.

It’s the lifestyle Joe and Teresa Giudice — who grew up together as working-class Italian-American kids — always hungered for but could never truly afford, sources said, even when they convinced themselves and everyone around them they could.

The Giudices’ shaky facade of massive personal wealth — increasingly fragile since a 2009 Chapter 7 bankruptcy filing — finally imploded in a spectacular way last week when they were hit with a 39-count criminal fraud indictment.

The federal charges range from allegations that the two conspired to forge W-2 forms, tax returns, pay stubs and other documents to trick banks into lending them money, to accusations of perjury and false statements in their bankruptcy proceedings.

This won’t do the reputation of McMansions any good. See the picture of the Giudice’s home about halfway through the news story: it looks like everything McMansion critics would hate including a large wrought-iron fence and gate, an elaborate front door, a roof that looks like a castle, and plenty of rooms. Yet, critics would like the symbolism: the home may have been impressive on the outside or looked good on TV but ultimately, it literally all a fraud.

So if and when they lose the home, who is going to buy it?

When scientific papers are redacted, how does it impede the progress of science?

An article about a recent controversial paper published in Nature includes a summary of how many scientific papers were redacted or the result of fraud since 1975:

In the meantime, the paper has been cited 11 times by other published papers building on the findings.

It may be impossible for anyone from outside to know the extent of the problems in the Nature paper. But the incident comes amid a phenomenon that some call a “retraction epidemic.”

Last year, research published in the Proceedings of the National Academy of Sciences found that the percentage of scientific articles retracted because of fraud had increased tenfold since 1975.

The same analysis reviewed more than 2,000 retracted biomedical papers and found that 67 percent of the retractions were attributable to misconduct, mainly fraud or suspected fraud.

“You have a lot of people who want to do the right thing, but they get in a position where their job is on the line or their funding will get cut, and they need to get a paper published,” said Ferric C. Fang, one of the authors of the analysis and a medical professor at the University of Washington. “Then they have this tempting thought: If only the data points would line up .?.?.?”

Fang said retractions may be rising because it is simply easier to cheat in an era of digital images, which can be easily manipulated. But he said the increase is caused at least in part by the growing competition for publication and for NIH grant money.

There are two consequences of this commonly discussed in the media. One is the price for taxpayers who fund some of the big money scientific and medical research through federal grants. Second is the credibility of science itself.

But, I think there is a third issue that is perhaps even more important. What does this say about what we actually know about the world? In other worlds, how many subsequent papers are built on the fraudulent or redacted work? Science often works in a chain or pyramid; later work builds on earlier findings, particularly ones published in more prestigious journals. So when a paper is questioned, like the piece in Nature, it isn’t just about the nature of that one paper. It is also about the 11 papers that have already cited it.

So what does this mean for what we actually know? How much does a redacted piece set back science? Or, do researchers hardly even notice? I suspect many of these redacted papers don’t slow down things too much but there is always the potential that a redacted paper could pull the rug out of important findings.

h/t Instapundit

Trying to ensure more accountability in US News & World Report college ranking data

The US News & World Report college rankings are big business but also a big headache in data collection. The company is looking into ways to ensure more trustworthy data:

A new report from The Washington Post‘s Nick Anderson explores the increasingly common problem, in which universities submit inflated standardized test scores and class rankings for members of their incoming classes to U.S. News, which doesn’t independently verify the information. Tulane University, Bucknell University, Claremont McKenna College, Emory University, and George Washington University have all been implicated in the past year alone. And those are just the schools that got caught:

A survey of 576 college admissions officers conducted by Gallup last summer for the online news outlet Inside Higher Ed found that 91 percent believe other colleges had falsely reported standardized test scores and other admissions data. A few said their own college had done so.

For such a trusted report, the U.S. News rankings don’t have many safeguards ensuring that their data is accurate. Schools self-report these statistics on the honor system, essentially. U.S. News editor Brian Kelly told Inside Higher Ed’s Scott Jaschik, “The integrity of data is important to everybody … I find it incredible to contemplate that institutions based on ethical behavior would be doing this.” But plenty of institutions are doing this, as we noted back in November 2012 when GWU was unranked after being caught submitting juiced stats. 

At this point, U.S. News shouldn’t be surprised by acknowledgment like those from Tulane and Bucknell. It turns out that if you let schools misreport the numbers — especially in a field of fierce academic competition and increasingly budgetary hardship — they’ll take you up on the offer. Kelly could’ve learned that by reading U.S. News‘ own blog, Morse Code. Written by data researcher Bob Morse, almost half of the recent posts have been about fraud. To keep schools more honest, the magazine is considering requiring university officials outside of enrollment offices to sign a statement vouching for submitted numbers. But still, no third party accountability would be in place, and many higher ed experts are already saying that the credibility of the U.S. News college rankings is shot.

Three quick thoughts:

1. With the amount of money involved in the entire process, this should not be a surprise. Colleges want to project the best image they can so having a weakly regulated system (and also a suspect methodology and set of factors to start with) can lead to abuses.

2. If the USNWR rankings can’t be trusted, isn’t there someone who could provide a more honest system? This sounds like an opportunity for someone.

3. I wonder if there are parallels to PED use in baseball. To some degree, it doesn’t matter if lots of schools are gaming the system as long as the perception among schools is that everyone else is doing it. With this perception, it is easier to justify one’s own cheating because colleges need to catch up or compete with each other.

Social psychologist on quest to find researchers who falsify data

The latest Atlantic magazine includes a short piece about a social psychologist who is out to catch other researchers who falsify data. Here is part of the story:

Simonsohn initially targeted not flagrant dishonesty, but loose methodology. In a paper called “False-Positive Psychology,” published in the prestigious journal Psychological Science, he and two colleagues—Leif Nelson, a professor at the University of California at Berkeley, and Wharton’s Joseph Simmons—showed that psychologists could all but guarantee an interesting research finding if they were creative enough with their statistics and procedures.

The three social psychologists set up a test experiment, then played by current academic methodologies and widely permissible statistical rules. By going on what amounted to a fishing expedition (that is, by recording many, many variables but reporting only the results that came out to their liking); by failing to establish in advance the number of human subjects in an experiment; and by analyzing the data as they went, so they could end the experiment when the results suited them, they produced a howler of a result, a truly absurd finding. They then ran a series of computer simulations using other experimental data to show that these methods could increase the odds of a false-positive result—a statistical fluke, basically—to nearly two-thirds.

Just as Simonsohn was thinking about how to follow up on the paper, he came across an article that seemed too good to be true. In it, Lawrence Sanna, a professor who’d recently moved from the University of North Carolina to the University of Michigan, claimed to have found that people with a physically high vantage point—a concert stage instead of an orchestra pit—feel and act more “pro-socially.” (He measured sociability partly by, of all things, someone’s willingness to force fellow research subjects to consume painfully spicy hot sauce.) The size of the effect Sanna reported was “out-of-this-world strong, gravity strong—just super-strong,” Simonsohn told me over Chinese food (heavy on the hot sauce) at a restaurant around the corner from his office. As he read the paper, something else struck him, too: the data didn’t seem to vary as widely as you’d expect real-world results to. Imagine a study that calculated male height: if the average man were 5-foot?10, you wouldn’t expect that in every group of male subjects, the average man would always be precisely 5-foot-10. Yet this was exactly the sort of unlikely pattern Simonsohn detected in Sanna’s data…

Simonsohn stressed that there’s a world of difference between data techniques that generate false positives, and fraud, but he said some academic psychologists have, until recently, been dangerously indifferent to both. Outright fraud is probably rare. Data manipulation is undoubtedly more common—and surely extends to other subjects dependent on statistical study, including biomedicine. Worse, sloppy statistics are “like steroids in baseball”: Throughout the affected fields, researchers who are too intellectually honest to use these tricks will publish less, and may perish. Meanwhile, the less fastidious flourish.

The current research may just provide incentives for researchers to cut corners and end up with false results. Publishing is incredibly important for the career of an academic and there is little systematic oversight of a researcher’s data. I’ve written before about ways that data could be made more open but it would take some work to put these ideas into practice.

What I wouldn’t want to happen is have people read a story like this and conclude that fields like social psychology have nothing to offer because who knows how many of the studies might be flawed. I also wonder about the vigilante edge to this story – it makes a journalistic piece to tell about a social psychologist who is battling his own field but this isn’t how science should work. Simonsohn should be joined by others who should also be concerned by these potential issues. Of course, there may not be many incentives to pursue this work as it might invite criticism from inside and outside the discipline.

Sharing data among scientists vs. “Big Data”

In a quest to make data available to other researchers to verify research results, researchers have come up against one kind of data that is not made publicly available: “big data” from big Internet firms.

The issue came to a boil last month at a scientific conference in Lyon, France, when three scientists from Google and the University of Cambridge declined to release data they had compiled for a paper on the popularity of YouTube videos in different countries.

The chairman of the conference panel — Bernardo A. Huberman, a physicist who directs the social computing group at HP Labs here — responded angrily. In the future, he said, the conference should not accept papers from authors who did not make their data public. He was greeted by applause from the audience…

At leading social science journals, there are few clear guidelines on data sharing. “The American Journal of Sociology does not at present have a formal position on proprietary data,” its editor, Andrew Abbott, a sociologist at the University of Chicago, wrote in an e-mail. “Nor does it at present have formal policies enforcing the sharing of data.”

The problem is not limited to the social sciences. A recent review found that 44 of 50 leading scientific journals instructed their authors on sharing data but that fewer than 30 percent of the papers they published fully adhered to the instructions. A 2008 review of sharing requirements for genetics data found that 40 of 70 journals surveyed had policies, and that 17 of those were “weak.”

Who will win the battle between proprietary data and science? The article makes it sound like scientists are all on one side, particularly because of an interest in fighting issues like scientific fraud. At the same time, scientific journals don’t seem to be “enforcing” their guidelines or the individual scientists who are publishing in these journals aren’t following these guidelines.

The other side of this debate is not presented in this story: what do these big Internet firms, like Google, Yahoo, and Facebook think about sharing this data? This is not a small issue: these firms are spending a good amount of money on analyzing this data and probably hoping to use it for their own business and research purposes. For example, Microsoft recently set up a lab with several well-known researchers in New York City. Would the social scientists who work in such labs want to insist that the data be open? Should these companies have to open up their proprietary data to satisfy the requirements of the larger scientific community?

I suspect this will be an ongoing issue as social scientists look to analyze more innovative data that big companies have collected and that are more difficult for researchers to collect on their own. Will researchers be willing to forgo sharing this kind of data with the wider scientific community if they can get their hands on unique data?

Increase in retractions of scientific articles tied to problems in scientific process

Several scientists are calling for changes in how scientific work is conducted and published because of a rise in retracted articles:

Dr. Fang became curious how far the rot extended. To find out, he teamed up with a fellow editor at the journal, Dr. Arturo Casadevall of the Albert Einstein College of Medicine in New York. And before long they reached a troubling conclusion: not only that retractions were rising at an alarming rate, but that retractions were just a manifestation of a much more profound problem — “a symptom of a dysfunctional scientific climate,” as Dr. Fang put it.

Dr. Casadevall, now editor in chief of the journal mBio, said he feared that science had turned into a winner-take-all game with perverse incentives that lead scientists to cut corners and, in some cases, commit acts of misconduct…

Last month, in a pair of editorials in Infection and Immunity, the two editors issued a plea for fundamental reforms. They also presented their concerns at the March 27 meeting of the National Academies of Sciences committee on science, technology and the law.

Here is what Fang and Casadevall suggest may help reduce these issues:

To change the system, Dr. Fang and Dr. Casadevall say, start by giving graduate students a better understanding of science’s ground rules — what Dr. Casadevall calls “the science of how you know what you know.”

They would also move away from the winner-take-all system, in which grants are concentrated among a small fraction of scientists. One way to do that may be to put a cap on the grants any one lab can receive.

In other words, give graduate students more training in ethics and the sociology of science while also redistributing scientific research money so that more researchers can be involved. There is a lot to consider here. Of course, there might always be researchers tempted to commit fraud yet these scientists are arguing that the current system and circumstances needs to be tweaked to fight this. Graduate students and young faculty are well aware of what they have to do: publish research in the highest-ranked journals they can. Jobs and livelihoods are on the line. With that pressure, it makes sense that some may resort to unethical measures to get published.

Three other thoughts:

1. How often is social science research retracted? If it is infrequent, should it happen more often?

2. Even if an article or study is retracted, this doesn’t solve the whole issue as that work may have been cited a lot and become well known. Perhaps the bigger problem is “erasing” this study from the collective science memory. This reminds me of newspaper corrections; when you go find the original printing, you don’t know there was a later correction. The same thing can happen here: scientific studies can have long lives.

3. Should disciplines or journals have groups that routinely assess the validity of research studies? This would go beyond peer review and give a group the authority to ask questions about suspicious papers. Alas, this still wouldn’t catch even most of the problematic papers…

Don’t dismiss social science research just because of one fradulent scientist

Andrew Ferguson argued in early December that journalists fall too easily for bad academic research. However, he seems to base much of his argument on the actions of one fraudulent scientist:

Lots of cultural writing these days, in books and magazines and newspapers, relies on the so-called Chump Effect. The Effect is defined by its discoverer, me, as the eagerness of laymen and journalists to swallow whole the claims made by social scientists. Entire journalistic enterprises, whole books from cover to cover, would simply collapse into dust if even a smidgen of skepticism were summoned whenever we read that “scientists say” or “a new study finds” or “research shows” or “data suggest.” Most such claims of social science, we would soon find, fall into one of three categories: the trivial, the dubious, or the flatly untrue.

A rather extreme example of this third option emerged last month when an internationally renowned social psychologist, Diederik Stapel of Tilburg University in the Netherlands, was proved to be a fraud. No jokes, please: This social psychologist is a fraud in the literal, perhaps criminal, and not merely figurative, sense. An investigative committee concluded that Stapel had falsified data in at least “several dozen” of the nearly 150 papers he had published in his extremely prolific career…

But it hardly seems to matter, does it? The silliness of social psychology doesn’t lie in its questionable research practices but in the research practices that no one thinks to question. The most common working premise of social-psychology research is far-fetched all by itself: The behavior of a statistically insignificant, self-selected number of college students or high schoolers filling out questionnaires and role-playing in a psych lab can reveal scientifically valid truths about human behavior…

Who cares? The experiments are preposterous. You’d have to be a highly trained social psychologist, or a journalist, to think otherwise. Just for starters, the experiments can never be repeated or their results tested under controlled conditions. The influence of a hundred different variables is impossible to record. The first group of passengers may have little in common with the second group. The groups were too small to yield statistically significant results. The questionnaire is hopelessly imprecise, and so are the measures of racism and homophobia. The notions of “disorder” and “stereotype” are arbitrary—and so on and so on.

Yet the allure of “science” is too strong for our journalists to resist: all those numbers, those equations, those fancy names (say it twice: the Self-Activation Effect), all those experts with Ph.D.’s!

I was afraid that the actions of one scientist might taint the work of many others.

But there are a couple of issues here and several are worth pursuing:

1. The fact that Stapel committed fraud doesn’t mean that all scientists do bad work. Ferguson seems to want to blame other scientists for not knowing Stapel was committing fraud – how exactly would they have known?

2. Ferguson doesn’t seem to like social psychology. He does point to some valid methodological concerns: many studies involve small groups of undergraduates. Drawing large conclusions from these studies is difficult and indeed, perhaps dangerous. But this isn’t all social psychology is about.

2a. More generally, Ferguson could be writing about a lot of disciplines. Medical research tends to start with small groups and then decisions are made. Lots of research, particularly in the social sciences, could be invalidated if Ferguson was completely right. Ferguson really would suggest “Most such claims of social science…fall into one of three categories: the trivial, the dubious, or the flatly untrue.”?

3. I’ve said it before and I’ll say it again: journalists need more training in order to understand what scientific studies mean. Science doesn’t work in the way that journalists suggests where there is a steady stream of big findings. Rather, scientists find something and then others try to replicate the findings in different settings with different populations. Science is more like an accumulation of evidence than a lot of sudden lightning strikes of new facts. One small study of undergraduates may not tell us much but dozens of such experiments among different groups might.

4. I can’t help but wonder if there is a political slant to this: what if scientists were reporting positive things about conservative viewpoints? Ferguson complains that measuring things like racism and homophobia are difficult but this is the nature of studying humans and society. Ferguson just wants to say that it is all “arbitrary” – this is simply throwing up our hands and saying the world is too difficult to comprehend so we might as well quit. If there isn’t a political edge here, perhaps Ferguson is simply anti-science? What science does Ferguson suggest is credible and valid?

In the end, you can’t dismiss all of social psychology because of the actions of one scientist or because journalists are ill-prepared to report on scientific findings.

h/t Instapundit