The importance of statistics on college campuses

Within a longer look at the fate of the humanities, one Harvard student suggests statistics dominates campus conversations:

Photo by Markus Spiske on Pexels.com

I asked Haimo whether there seemed to be a dominant vernacular at Harvard. (When I was a student there, people talked a lot about things being “reified.”) Haimo told me that there was: the language of statistics. One of the leading courses at Harvard now is introductory statistics, enrolling some seven hundred students a semester, up from ninety in 2005. “Even if I’m in the humanities, and giving my impression of something, somebody might point out to me, ‘Well, who was your sample? How are you gathering your data?’ ” he said. “I mean, statistics is everywhere. It’s part of any good critical analysis of things.”

It struck me that I knew at once what Haimo meant: on social media, and in the press that sends data visualizations skittering across it, statistics is now everywhere, our language for exchanging knowledge. Today, a quantitative idea of rigor underlies even a lot of arguments about the humanities’ special value. Last school year, Spencer Glassman, a history major, argued in a column for the student paper that Harvard’s humanities “need to be more rigorous,” because they set no standards comparable to the “tangible things that any student who completes Stat 110 or Physics 16 must know.” He told me, “One could easily walk away with an A or A-minus and not have learned anything. All the STEM concentrators have this attitude that humanities are a joke.”…

Haimo and I turned back toward Harvard Square. “I think the problem for the humanities is you can feel like you’re not really going anywhere, and that’s very scary,” he said. “You write one essay better than the other from one semester to the next. That’s not the same as, you know, being able to solve this economics problem, or code this thing, or do policy analysis.” This has always been true, but students now recognized less of the long-term value of writing better or thinking more deeply than they previously had. Last summer, Haimo worked at the HistoryMakers, an organization building an archive of African American oral history. He said, “When I was applying, I kept thinking, What qualifies me for this job? Sure, I can research, I can write things.” He leaned forward to check for passing traffic. “But those skills are very difficult to demonstrate, and it’s frankly not what the world at large seems in demand of.”

I suspect this level of authority is not just true on a college campus: numbers have a particular power in the world today. They convey proof. Patterns and trends. There can often be little space to ask where the numbers came from or what they mean.

Is this the only way to understand the world? No. We need to consider all sorts of data to understand and explain what is going on. Stories and narratives do not just exist to flesh out quantitative patterns; they can convey deep truths and raise important questions.

But what if we only care today about what is most efficient and most able to directly translate into money? If college students and others prioritize jobs over everything else, does this advantage numbers and their connections to STEM and certain occupations that are the only ways or perceived certain ways to wealth and a return on investment? From later in the article:

In a quantitative society for which optimization—getting the most output from your input—has become a self-evident good, universities prize actions that shift numbers, and pre-professionalism lends itself to traceable change.

If American society prizes money and a certain kind of success above all else, are these patterns that surprising?

Helping readers see patterns and the bigger picture in new housing price data

The headline reads:

Photo by Kindel Media on Pexels.com

Home prices fell for the first time in 3 years last month – and it was the biggest decline since 2011

This quickly relays information about recent trends – prices went down for the first time in a while – as well as longer patterns – biggest drop in over a decade.

Next are some figures on housing affordability:

Now, housing affordability is at its lowest level in 30 years. It requires 32.7% of the median household income to purchase the average home using a 20% down payment on a 30-year mortgage, according to Black Knight. That is about 13 percentage points more than it did entering the pandemic and significantly more than both the years before and after the Great Recession. The 25-year average is 23.5%.

The housing affordability statistic is put into terms accessible to a broad audience: nearly 33% of the median household income is needed to buy the average house with common mortgage terms. Additionally, this percentage is higher than recent years and a longer 25 year stretch.

Some housing markets are seeing bigger price declines than others:

Some local markets are seeing even steeper declines over the last few months. San Jose, California, saw the largest, with home prices now down 10% in recent months, followed by Seattle (-7.7%), San Francisco (-7.4%), San Diego (-5.6%), Los Angeles (-4.3%) and Denver (-4.2%).

It could be noted that these are expensive and hot real estate markets. Yes, they had larger drops but they had been pushed higher in recent years than many other markets.

And the article ends with information on mortgage rates:

The average rate on the popular 30-year fixed mortgage began this year right around 3%, according to Mortgage News Daily. It climbed slowly month to month, pulling back slightly in May but then shot more dramatically to just over 6% in June. It is now hovering around 5.75%.

This highlights the rise in mortgage rates this year. Some broader context might be helpful; what was the average rate before COVID-19 or over the last 10 years?

This article provides numerous statistics and often puts the figures in context. Yet, it does lead one lingering question: what is the state of housing prices overall? One answer might be change after a period of trends during COVID-19. Another might be to focus on different actors involved: how does this affect the housing industry or what about the difficulty of some to get into the housing market or it could be a story about higher housing values for many homeowners.

Statistics are not just facts thrown into a void; they require interpretation and are often applied to particular concerns or issues.

Americans overestimate the size of smaller groups, underestimate the size of larger groups

Recent YouGov survey data shows Americans have a hard time estimating the population of a number of groups:

When people’s average perceptions of group sizes are compared to actual population estimates, an intriguing pattern emerges: Americans tend to vastly overestimate the size of minority groups. This holds for sexual minorities, including the proportion of gays and lesbians (estimate: 30%, true: 3%), bisexuals (estimate: 29%, true: 4%), and people who are transgender (estimate: 21%, true: 0.6%).

It also applies to religious minorities, such as Muslim Americans (estimate: 27%, true: 1%) and Jewish Americans (estimate: 30%, true: 2%). And we find the same sorts of overestimates for racial and ethnic minorities, such as Native Americans (estimate: 27%, true: 1%), Asian Americans (estimate: 29%, true: 6%), and Black Americans (estimate: 41%, true: 12%)…

A parallel pattern emerges when we look at estimates of majority groups: People tend to underestimate rather than overestimate their size relative to their actual share of the adult population. For instance, we find that people underestimate the proportion of American adults who are Christian (estimate: 58%, true: 70%) and the proportion who have at least a high school degree (estimate: 65%, true: 89%)…

Misperceptions of the size of minority groups have been identified in prior surveys, which observers have often attributed to social causes: fear of out-groups, lack of personal exposure, or portrayals in the media. Yet consistent with prior research, we find that the tendency to misestimate the size of demographic groups is actually one instance of a broader tendency to overestimate small proportions and underestimate large ones, regardless of the topic. 

I wonder how much this might be connected to a general sense of innumeracy. Big numbers can be difficult to understand and the United States has over 330,000,000 residents. Percentages and absolute numbers regarding certain groups are not always provided. I am more familiar with some of these percentages and numbers because my work requires it but it does not come up in all fields or settings.

Additionally, where would this information be taught or regularly shared? Civics classes alongside information about government structures and national history? Math classes as examples of relevant information? On television programs or in print materials? At political events or sports games? I would be interesting in making all of this more publicly visible so not just those who read the Statistical Abstract of the United States or have Census.gov as a top bookmark know this information.

Thinking about probabilistic futures

When looking to predict the future, one historian of science suggests we need to think probabilistically:

Photo by Tara Winstead on Pexels.com

The central message sent from the history of the future is that it’s not helpful to think about “the Future.” A much more productive strategy is to think about futures; rather than “prediction,” it pays to think probabilistically about a range of potential outcomes and evaluate them against a range of different sources. Technology has a significant role to play here, but it’s critical to bear in mind the lessons from World3 and Limits to Growth about the impact that assumptions have on eventual outcomes. The danger is that modern predictions with an AI imprint are considered more scientific, and hence more likely to be accurate, than those produced by older systems of divination. But the assumptions underpinning the algorithms that forecast criminal activity, or identify potential customer disloyalty, often reflect the expectations of their coders in much the same way as earlier methods of prediction did.

Social scientists have long hoped to contribute to accurate predictions. We want to both better understand what is happening now as well as provide insights into what will come after.

The idea of thinking probabilistically is a key part of the Statistics course I teach each fall semester. We can easily fall into using language that suggests we “prove” things or relationships. This implies certainty and we often think science leads to certainty, laws, and cause and effect. However, when using statistics we are usually making estimates about the population from the samples and information we have in front of us. Instead of “proving” things, we can speak to the likelihood of something happening or the degree to which one variable affects another. Our certainty of these relationships or outcomes might be higher or lower, depending on the information we are working with.

All of this relates to predictions. We can work to improve our current models to better understand current or past conditions but the future involves changes that are harder to know. Like inferential statistics, making predictions involves using certain information we have now to come to conclusions.

The idea of thinking both (1) probabilistically and (2) plural futures can help us understand our limitations in considering the future. In regards to probabilities, we can higher or lower likelihoods regarding our predictions of what will happen. In thinking of plural futures, we can work with multiple options or pathways that may occur. All of this should be accompanied by humility and creativity as it is difficult to predict the future, even with great information today.

Fighting math-phobia in America

The president of Barnard College offers three suggestions for making math more enticing and relevant for Americans:

First, we can work to bring math to those who might shy away from it. Requiring that all students take courses that push them to think empirically with data, regardless of major, is one such approach. At Barnard — a college long known for its writers and dancers — empirical reasoning requirements are built into our core curriculum. And, for those who struggle to meet the demands of data-heavy classes, we provide access (via help rooms) to tutors who focus on diminishing a student’s belief that they “just aren’t good at math.”

Second, employers should encourage applications from and be open to having students with diverse educational interests in their STEM-related internships. Don’t only seek out the computer science majors. This means potentially taking a student who doesn’t come with all the computation chops in hand but does have a good attitude and a willingness to learn. More often than not, such opportunities will surprise both intern and employee. When bright students are given opportunities to tackle problems head on and learn how to work with and manipulate data to address them, even those anxious about math tend to find meaning in what they are doing and succeed. STEM internships also allow students to connect with senior leaders who might have had to overcome a similar experience of questioning their mathematical or computational skills…

Finally, we need to reject the social acceptability of being bad at math. Think about it: You don’t hear highly intelligent people proclaiming that they can’t read, but you do hear many of these same individuals talking about “not being a math person.” When we echo negative sentiments like that to ourselves and each other, we perpetuate a myth that increases overall levels of math phobia. When students reject math, they pigeonhole themselves into certain jobs and career paths, foregoing others only because they can’t imagine doing more computational work. Many people think math ability is an immutable trait, but evidence clearly shows this is a subject in which we can all learn and succeed.

Fighting innumeracy – an inability to use or understand numbers – is a worthwhile goal. I like the efforts suggested above though I worry a bit if they are tied too heavily to jobs and national competitiveness. These goals can veer toward efficiency and utilitarianism rather than more tangible results like better understanding of and interaction society and self. Fighting stigma is going to be hard by invoking more pressure – the US is falling behind! your future career is on the line! – rather than showing how numbers can help people.

This is why I would be in favor of more statistics training for students at all levels. The math required to do statistics can be tailored to different levels, statistical tests, and subjects. The basic knowledge can be helpful in all sorts of areas citizens run into: interpreting reports on surveys and polls, calculating odds and risks (including in finances and sports), and understanding research results. The math does not have to be complicated and instruction can address understanding where statistics come from and how they can be used.

I wonder how much of this might also be connected to the complicated relationship Americans have with expertise and advanced degrees. Think of the typical Hollywood scene of a genius at work: do they look crazy or unusual? Think about presidential candidates: do Americans want people with experience and knowledge or someone they can identify with and have dinner with? Math, in being unknowable to people of average intelligence, may be connected to those smart eccentrics who are necessary for helping society progress but not necessarily the people you would want to be or hang out with.

The retraction of a study provides a reminder of the importance of levels of measurement

Early in Statistics courses, students learn about different ways that variables can be measured. This is often broken down into three categories: nominal variables (unordered, unranked), ordinal variables (ranked but with varied category widths), and interval-ratio (ranked and with consistent spaces between categories). Decisions about how to measure variables can have significant influence on what can be done with the data later. For example, here is a study that received a lot of attention when published but the researchers miscoded a nominal variable:

In 2015, a paper by Jean Decety and co-authors reported that children who were brought up religiously were less generous. The paper received a great deal of attention, and was covered by over 80 media outlets including The Economist, the Boston Globe, the Los Angeles Times, and Scientific American. As it turned out, however, the paper by Decety was wrong. Another scholar, Azim Shariff, a leading expert on religion and pro-social behavior, was surprised by the results, as his own research and meta-analysis (combining evidence across studies from many authors) indicated that religious participation, in most settings, increased generosity. Shariff requested the data to try to understand more clearly what might explain the discrepancy.

To Decety’s credit, he released the data. And upon re-analysis, Shariff discovered that the results were due to a coding error. The data had been collected across numerous countries, e.g. United States, Canada, Turkey, etc. and the country information had been coded as “1, 2, 3…” Although Decety’s paper had reported that they had controlled for country, they had accidentally not controlled for each country, but just treated it as a single continuous variable so that, for example “Canada” (coded as 2) was twice the “United States” (coded as 1). Regardless of what one might think about the relative merits and rankings of countries, this is obviously not the right way to analyze data. When it was correctly analyzed, using separate indicators for each country, Decety’s “findings” disappeared. Shariff’s re-analysis and correction was published in the same journal, Current Biology, in 2016. The media, however, did not follow along. While it covered extensively the initial incorrect results, only four media outlets picked up the correction.

In fact, Decety’s paper has continued to be cited in media articles on religion. Just last month two such articles appeared (one on Buzzworthy and one on TruthTheory) citing Decety’s paper that religious children were less generous. The paper’s influence seems to continue even after it has been shown to be wrong.

Last month, however, the journal, Current Biology, at last formally retracted the paper. If one looks for the paper on the journal’s website, it gives notice of the retraction by the authors. Correction mechanisms in science can sometimes work slowly, but they did, in the end, seem to be effective here. More work still needs to be done as to how this might translate into corrections in media reporting as well: The two articles above were both published after the formal retraction of the paper.

To reiterate, the researcher treated country – a nominal variable in this case since the countries were not ranked or ordered in any particular way – incorrectly which then threw off the overall results. When then using country correctly – from the description above, it sounds like using country as a dummy variable coded 1 and 0 – the findings that received all the attention disappeared.

The other issue at play here is whether corrections to academic studies or retractions are treated as such. It is hard to notify readers that a previously published study had flaws and the results have changed.

All that to say, paying attention to level of measurement earlier in the process helps avoid problems down the road.

Recommendations to help with SCOTUS’ innumeracy

In the wake of recent comments about “sociological gobbledygook” and measures of gerrymandering, here are some suggestions for how the Supreme Court can better use statistical evidence:

McGhee, who helped develop the efficiency gap measure, wondered if the court should hire a trusted staff of social scientists to help the justices parse empirical arguments. Levinson, the Texas professor, felt that the problem was a lack of rigorous empirical training at most elite law schools, so the long-term solution would be a change in curriculum. Enos and his coauthors proposed “that courts alter their norms and standards regarding the consideration of statistical evidence”; judges are free to ignore statistical evidence, so perhaps nothing will change unless they take this category of evidence more seriously.

But maybe this allergy to statistical evidence is really a smoke screen — a convenient way to make a decision based on ideology while couching it in terms of practicality.

“I don’t put much stock in the claim that the Supreme Court is afraid of adjudicating partisan gerrymanders because it’s afraid of math,” Daniel Hemel, who teaches law at the University of Chicago, told me. “[Roberts] is very smart and so are the judges who would be adjudicating partisan gerrymandering claims — I’m sure he and they could wrap their minds around the math. The ‘gobbledygook’ argument seems to be masking whatever his real objection might be.”

If there is indeed innumeracy present, the justices would not be alone in this. Many Americans do not receive an education in statistics, let alone have enough training to make sense of the statistics regularly used in academic studies.

At the same time, we might go further than the argument made above: should judges make decisions based on statistics (roughly facts) more than ideology or arguments (roughly interpretation)? Again, many Americans struggle with this: there can be broad empirical patterns or even correlations but some would insist that their own personal experiences do not match these. Should judicial decisions be guided by principles and existing case law or by current statistical realities? The courts are not the only social spheres that struggle with this.

Using a GRIM method to find unlikely published results

Discovering which published studies may be incorrect or fraudulent takes some work and here is a newer tool: GRIM.

GRIM is the acronym for Granularity-Related Inconsistency of Means, a mathematical method that determines whether an average reported in a scientific paper is consistent with the reported sample size and number of items. Here’s a less-technical answer: GRIM is a B.S. detector. The method is based on the simple insight that only certain averages are possible given certain sets of numbers. So if a researcher reports an average that isn’t possible, given the relevant data, then that researcher either (a) made a mistake or (b) is making things up.

GRIM is the brainchild of Nick Brown and James Heathers, who published a paper last year in Social Psychological and Personality Science explaining the method. Using GRIM, they examined 260 psychology papers that appeared in well-regarded journals and found that, of the ones that provided enough necessary data to check, half contained at least one mathematical inconsistency. One in five had multiple inconsistencies. The majority of those, Brown points out, are “honest errors or slightly sloppy reporting.”…

After spotting the Wansink post, Anaya took the numbers in the papers and — to coin a verb — GRIMMED them. The program found that the four papers based on the Italian buffet data were shot through with impossible math. If GRIM was an actual machine, rather than a humble piece of code, its alarms would have been blaring. “This lights up like a Christmas tree,” Brown said after highlighting on his computer screen the errors Anaya had identified…

Anaya, along with Brown and Tim van der Zee, a graduate student at Leiden University, also in the Netherlands, wrote a paper pointing out the 150 or so GRIM inconsistencies in those four Italian-restaurant papers that Wansink co-authored. They found discrepancies between the papers, even though they’re obviously drawn from the same dataset, and discrepancies within the individual papers. It didn’t look good. They drafted the paper using Twitter direct messages and titled it, memorably, “Statistical heartburn: An attempt to digest four pizza publications from the Cornell Food and Brand Lab.”

I wonder how long it will be before journals employ such methods for submitted manuscripts. Imagine Turnitin for academic studies. Then, what would happen to authors if problems are found?

It also sounds like a program like this could make it easy to do mass analysis of published studies to help answer questions like how many findings are fraudulent.

Perhaps it is too easy to ask whether GRIM has been vetted by outside persons…

The most important annual statistical moment in America: the start of March Madness

When do statistics matter the most for the average American? The week of the opening weekend of March Madness – the period between the revealing of the 68 team field to the final games of the Round of 32 – may just be that point. All the numbers are hard to resist; win-loss records, various other metrics of team performance (strength of schedule, RPI, systems attached to particular analysts, advanced basketball statistics, etc.), comparing seed numbers and their historic performance, seeing who the rest of America has picked (see the percentages for the millions of brackets at ESPN), and betting lines and pools.

Considering the suggestions that Americans are fairly innumerate, perhaps this would be a good period for public statistics education. How does one sift through all these numbers, thinking about how they are measured and making decisions based on the figures? Sadly, I usually teach Statistics in the fall so I can’t put any of my own ideas into practice…

When software – like Excel – hampers scientific research

Statistical software can be very helpful but it does not automatically guarantee correct analyses:

A team of Australian researchers analyzed nearly 3,600 genetics papers published in a number of leading scientific journals — like Nature, Science and PLoS One. As is common practice in the field, these papers all came with supplementary files containing lists of genes used in the research.

The Australian researchers found that roughly 1 in 5 of these papers included errors in their gene lists that were due to Excel automatically converting gene names to things like calendar dates or random numbers…

Genetics isn’t the only field where a life’s work can potentially be undermined by a spreadsheet error. Harvard economists Carmen Reinhart and Kenneth Rogoff famously made an Excel goof — omitting a few rows of data from a calculation — that caused them to drastically overstate the negative GDP impact of high debt burdens. Researchers in other fields occasionally have to issue retractions after finding Excel errors as well…

For the time being, the only fix for the issue is for researchers and journal editors to remain vigilant when working with their data files. Even better, they could abandon Excel completely in favor of programs and languages that were built for statistical research, like R and Python.

Excel has particular autoformatting issues but all statistical programs have unique ways of handling data. Spreadsheets of data – often formatted with cases in the rows and variables in the columns – don’t automatically read in correctly.

Additionally, user error can lead to issues with any sort of statistical software. Different programs may have different quirks but various researchers can do all sort of weird things from recoding incorrectly to misreading missing data to misinterpreting results. Data doesn’t analyze itself and statistical software is just a tool that needs to be used correctly.

A number of researchers have in recent years called for open data once a paper is published and this could help those in an academic field spot mistakes. Of course, the best solution is to double-check (at least) data before review and publication. Yet, when you are buried in a quantitative project and there are dozens of steps of data work and analysis, it can be hard to (1) keep track of everything and (2) closely watch for errors. Perhaps we need independent data review even before publication.