In the wake of recent comments about “sociological gobbledygook” and measures of gerrymandering, here are some suggestions for how the Supreme Court can better use statistical evidence:
McGhee, who helped develop the efficiency gap measure, wondered if the court should hire a trusted staff of social scientists to help the justices parse empirical arguments. Levinson, the Texas professor, felt that the problem was a lack of rigorous empirical training at most elite law schools, so the long-term solution would be a change in curriculum. Enos and his coauthors proposed “that courts alter their norms and standards regarding the consideration of statistical evidence”; judges are free to ignore statistical evidence, so perhaps nothing will change unless they take this category of evidence more seriously.
But maybe this allergy to statistical evidence is really a smoke screen — a convenient way to make a decision based on ideology while couching it in terms of practicality.
“I don’t put much stock in the claim that the Supreme Court is afraid of adjudicating partisan gerrymanders because it’s afraid of math,” Daniel Hemel, who teaches law at the University of Chicago, told me. “[Roberts] is very smart and so are the judges who would be adjudicating partisan gerrymandering claims — I’m sure he and they could wrap their minds around the math. The ‘gobbledygook’ argument seems to be masking whatever his real objection might be.”
If there is indeed innumeracy present, the justices would not be alone in this. Many Americans do not receive an education in statistics, let alone have enough training to make sense of the statistics regularly used in academic studies.
At the same time, we might go further than the argument made above: should judges make decisions based on statistics (roughly facts) more than ideology or arguments (roughly interpretation)? Again, many Americans struggle with this: there can be broad empirical patterns or even correlations but some would insist that their own personal experiences do not match these. Should judicial decisions be guided by principles and existing case law or by current statistical realities? The courts are not the only social spheres that struggle with this.
Discovering which published studies may be incorrect or fraudulent takes some work and here is a newer tool: GRIM.
GRIM is the acronym for Granularity-Related Inconsistency of Means, a mathematical method that determines whether an average reported in a scientific paper is consistent with the reported sample size and number of items. Here’s a less-technical answer: GRIM is a B.S. detector. The method is based on the simple insight that only certain averages are possible given certain sets of numbers. So if a researcher reports an average that isn’t possible, given the relevant data, then that researcher either (a) made a mistake or (b) is making things up.
GRIM is the brainchild of Nick Brown and James Heathers, who published a paper last year in Social Psychological and Personality Science explaining the method. Using GRIM, they examined 260 psychology papers that appeared in well-regarded journals and found that, of the ones that provided enough necessary data to check, half contained at least one mathematical inconsistency. One in five had multiple inconsistencies. The majority of those, Brown points out, are “honest errors or slightly sloppy reporting.”…
After spotting the Wansink post, Anaya took the numbers in the papers and — to coin a verb — GRIMMED them. The program found that the four papers based on the Italian buffet data were shot through with impossible math. If GRIM was an actual machine, rather than a humble piece of code, its alarms would have been blaring. “This lights up like a Christmas tree,” Brown said after highlighting on his computer screen the errors Anaya had identified…
Anaya, along with Brown and Tim van der Zee, a graduate student at Leiden University, also in the Netherlands, wrote a paper pointing out the 150 or so GRIM inconsistencies in those four Italian-restaurant papers that Wansink co-authored. They found discrepancies between the papers, even though they’re obviously drawn from the same dataset, and discrepancies within the individual papers. It didn’t look good. They drafted the paper using Twitter direct messages and titled it, memorably, “Statistical heartburn: An attempt to digest four pizza publications from the Cornell Food and Brand Lab.”
I wonder how long it will be before journals employ such methods for submitted manuscripts. Imagine Turnitin for academic studies. Then, what would happen to authors if problems are found?
It also sounds like a program like this could make it easy to do mass analysis of published studies to help answer questions like how many findings are fraudulent.
Perhaps it is too easy to ask whether GRIM has been vetted by outside persons…
When do statistics matter the most for the average American? The week of the opening weekend of March Madness – the period between the revealing of the 68 team field to the final games of the Round of 32 – may just be that point. All the numbers are hard to resist; win-loss records, various other metrics of team performance (strength of schedule, RPI, systems attached to particular analysts, advanced basketball statistics, etc.), comparing seed numbers and their historic performance, seeing who the rest of America has picked (see the percentages for the millions of brackets at ESPN), and betting lines and pools.
Considering the suggestions that Americans are fairly innumerate, perhaps this would be a good period for public statistics education. How does one sift through all these numbers, thinking about how they are measured and making decisions based on the figures? Sadly, I usually teach Statistics in the fall so I can’t put any of my own ideas into practice…
Statistical software can be very helpful but it does not automatically guarantee correct analyses:
A team of Australian researchers analyzed nearly 3,600 genetics papers published in a number of leading scientific journals — like Nature, Science and PLoS One. As is common practice in the field, these papers all came with supplementary files containing lists of genes used in the research.
The Australian researchers found that roughly 1 in 5 of these papers included errors in their gene lists that were due to Excel automatically converting gene names to things like calendar dates or random numbers…
Genetics isn’t the only field where a life’s work can potentially be undermined by a spreadsheet error. Harvard economists Carmen Reinhart and Kenneth Rogoff famously made an Excel goof — omitting a few rows of data from a calculation — that caused them to drastically overstate the negative GDP impact of high debt burdens. Researchers in other fields occasionally have to issue retractions after finding Excel errors as well…
For the time being, the only fix for the issue is for researchers and journal editors to remain vigilant when working with their data files. Even better, they could abandon Excel completely in favor of programs and languages that were built for statistical research, like R and Python.
Excel has particular autoformatting issues but all statistical programs have unique ways of handling data. Spreadsheets of data – often formatted with cases in the rows and variables in the columns – don’t automatically read in correctly.
Additionally, user error can lead to issues with any sort of statistical software. Different programs may have different quirks but various researchers can do all sort of weird things from recoding incorrectly to misreading missing data to misinterpreting results. Data doesn’t analyze itself and statistical software is just a tool that needs to be used correctly.
A number of researchers have in recent years called for open data once a paper is published and this could help those in an academic field spot mistakes. Of course, the best solution is to double-check (at least) data before review and publication. Yet, when you are buried in a quantitative project and there are dozens of steps of data work and analysis, it can be hard to (1) keep track of everything and (2) closely watch for errors. Perhaps we need independent data review even before publication.
Scientists regularly use p-values to evaluate their findings but apparently have difficulty explain exactly what they mean:
To be clear, everyone I spoke with at METRICS could tell me the technical definition of a p-value — the probability of getting results at least as extreme as the ones you observed, given that the null hypothesis is correct — but almost no one could translate that into something easy to understand.
It’s not their fault, said Steven Goodman, co-director of METRICS. Even after spending his “entire career” thinking about p-values, he said he could tell me the definition, “but I cannot tell you what it means, and almost nobody can.” Scientists regularly get it wrong, and so do most textbooks, he said. When Goodman speaks to large audiences of scientists, he often presents correct and incorrect definitions of the p-value, and they “very confidently” raise their hand for the wrong answer. “Almost all of them think it gives some direct information about how likely they are to be wrong, and that’s definitely not what a p-value does,” Goodman said.
We want to know if results are right, but a p-value doesn’t measure that. It can’t tell you the magnitude of an effect, the strength of the evidence or the probability that the finding was the result of chance.
So what information can you glean from a p-value? The most straightforward explanation I found came from Stuart Buck, vice president of research integrity at the Laura and John Arnold Foundation. Imagine, he said, that you have a coin that you suspect is weighted toward heads. (Your null hypothesis is then that the coin is fair.) You flip it 100 times and get more heads than tails. The p-value won’t tell you whether the coin is fair, but it will tell you the probability that you’d get at least as many heads as you did if the coin was fair. That’s it — nothing more. And that’s about as simple as I can make it, which means I’ve probably oversimplified it and will soon receive exasperated messages from statisticians telling me so.
Complicated but necessary? This can lead to fun situations when teaching statistics: students need to know enough to do the statistical work and evaluate findings (we at least need to know what to do with a calculated p-value, even if we don’t quite understand what it means) but explaining the complexity of some of these techniques wouldn’t necessarily help the learning process. In fact, the more you learn about statistics, you tend to find that the various methods and techniques have limitations even as they can help us better understand phenomena.
How will refugees be dispersed among European countries? This formula:
On Wednesday, shortly after European Commission President Jean-Claude Juncker announced a new plan to distribute 120,000 asylum-seekers currently in Greece, Hungary, and Italy among the EU’s 28 member states, Duncan Robinson of the Financial Times tweeted a series of grainy equations from the annex of a proposed European regulation, which establishes a mechanism for relocating asylum-seekers during emergency situations beyond today’s acute crisis. Robinson’s message: “So, how do they decide how many refugees each country should receive? ‘Well, it’s very simple…’”
In an FAQ posted on Wednesday, the European Commission expanded on the thinking behind the elaborate math. Under the proposed plan, if the Commission determines at some point in the future that there is a refugee crisis in a given country (as there is today in Greece, Hungary, and Italy, the countries migrants reach first upon arriving in Europe), it will set a number for how many refugees in that country should be relocated throughout the EU. That number will be “not higher than 40% of the number of [asylum] applications made [in that country] in the past six months.”…
What’s most striking to me is the contrast between the sigmas and subscripts in the refugee formula—the inhumanity of technocratic compromise by mathematical equation—and the raw, tragic, heroic humanity on display in recent coverage of the refugees from Syria, Afghanistan, Eritrea, and elsewhere who are pouring into Europe.
The writer hints at the end here that the bureaucratic formula and stories of human lives at stake are incompatible. How could we translate people who need help into cold, impersonal numbers? This is a common claim: statistics take away human stories and dignity. They are unfeeling. They can’t sum the experiences of individuals. One online quote sums this up: “Statistics are human beings with the tears wiped off.”
Yet, we need both the stories and the numbers to truly address the situation. Individual stories are important and interesting. Tragic cases tend to draw people’s attention, particularly if presented in attractive ways. But, it is difficult to convey all the stories of the refugees and migrants. Where would they be told and who would sit through them all? The statistics and formulas help give us the big picture. Just how many refugees are there? (Imagine a situation where there are only 10 refugees but with very compelling stories. Would this compel nations to act.) How can they be slotted into existing countries and systems?
On top of that, you can’t really have the nations of today without bureaucracies. We might not like that they are slow moving or inefficient at times or can be overwhelming. How can you run a major social system without a bureaucratic structure? Would we like to go to a hospital that was not a bureaucracy? How do you keep millions of citizens in a country moving in a similar direction? Decentralization or non-hierarchical systems can only go so far in addressing major tasks.
With that said, the formula looks complicated but the explanation in the text is fairly easy to understand: there are a set of weighted factors that dictate how many refugees will be assigned to each country.
I’ve seen this argument several places, including this AP story: collecting national data about fatalities due to police would be helpful.
To many Americans, it feels like a national tidal wave. And yet, no firm statistics can say whether this spate of officer-involved deaths is a growing trend or simply a series of coincidences generating a deafening buzz in news reports and social media.
“We have a huge scandal in that we don’t have an accurate count of the number of people who die in police custody,” says Samuel Walker, emeritus professor of criminal justice at the University of Nebraska at Omaha and a leading scholar on policing and civil liberties. “That’s outrageous.”…
The FBI’s Uniform Crime Reports, for instance, track justifiable police homicides – there were 1,688 between 2010 and 2013 – but the statistics rely on voluntary reporting by local law enforcement agencies and are incomplete. Circumstances of the deaths, and other information such as age and race, also aren’t required.
The Wall Street Journal, detailing its own examination of officer-involved deaths at 105 of the nation’s 110 largest police departments, reported last week that federal data failed to include or mislabeled hundreds of fatal police encounters…
Chettiar is hopeful that recent events will create the “political and public will” to begin gathering and analyzing the facts.
A few quick thoughts:
1. Just because this data hasn’t been collected doesn’t necessarily mean this was intentional. Government agencies collect lots of data but it takes some deliberate action and foresight regarding what should and shouldn’t be reported. Given that there are a least a few hundred such deaths each year, you would think someone would have flagged such information as interesting but apparently not. Now would be a good time to start reporting and collecting such data.
2. Statistics would be helpful in providing a broader perspective on the issue but, as the article notes, statistics have certain kinds of persuasive power as do individual events or broad narratives not necessarily backed by statistics. In our individualistic culture, specific stories can often go a long ways. At the same time, social problems are often defined by their scope which involves statistical measures of how many people are affected.