Non-fiction books can have limited fact-checking, no peer review

An example of a significant misinterpretation of survey data in a recent book provides a reminder of about reading “facts”:

There are a few major lessons here. The first is that books are not subject to peer review, and in the typical case not even subject to fact-checking by the publishers — often they put responsibility for fact-checking on the authors, who may vary in how thoroughly they conduct such fact-checks and in whether they have the expertise to notice errors in interpreting studies, like Wolf’s or Dolan’s.

The second, Kimbrough told me, is that in many respects we got lucky in the Dolan case. Dolan was using publicly available data, which meant that when Kimbrough doubted his claims, he could look up the original data himself and check Dolan’s work. “It’s good this work was done using public data,” Kimbrough told me, “so I’m able to go pull the data and look into it and see, ‘Oh, this is clearly wrong.’”…

Book-publishing culture similarly needs to change to address that first problem. Books often go to print with less fact-checking than an average Vox article, and at hundreds of pages long, that almost always means several errors. The recent high-profile cases where these errors have been serious, embarrassing, and highly public might create enough pressure to finally change that.

In the meantime, don’t trust shocking claims with a single source, even if they’re from a well-regarded expert. It’s all too easy to misread a study, and all too easy for those errors to make it all the way to print.

These are good steps, particularly the last paragraph above: shocking or even surprising statistics are worth checking against the data or against other sources to verify. After all, it is not that hard for a mutant statistic to spread.

Unfortunately, correctly interpreting data continues to get pushed down the chain to readers and consumers. When I read articles or books in 2019, I need to be fairly skeptical of what I am reading. This is hard to do with (1) the glut of information we all face (so many sources!) and (2) needing to know how to be skeptical of information. This is why it is easy to fall into filtering sources of information into camps of sources we trust versus ones we do not. At the same time, knowing how statistics and data works goes a long way in questioning information. In the main example in the story above, the interpretation issue came down to how the survey questions were asked. An average consumer of the book may have little idea to question the survey data collection process, let alone the veracity of the claim. It took an academic who works with the same dataset to question the interpretation.

To do this individual fact-checking better (and to do it better at a structural level before books are published), we need to combat innumeracy. Readers need to be able to understand data: how it is collected, how it is interpreted, and how it ends up in print or in the public arena. This usually does not require a deep knowledge of particular methods but it does require some familiarity with how data becomes data. Similarly, being cynical about all data and statistics is not the answer; readers need to know when data is good enough.

Home value algorithms show consumers data with outliers, mortgage companies take the outliers out

A homeowner can look online to get an estimate of the value of their home but that number may not match what a lender computes:

Different AVMs are designed to deliver different types of valuations. And therein lies confusion.

Consumers don’t realize that there’s an AVM for nearly any purpose, which explains why different algorithms serve up different results, said Ann Regan, an executive product manager with real estate analytic firm CoreLogic. “The scores presented to consumers are not the same version that is being used by lenders to make decisions,” she said. “The consumer-facing AVMs are designed for consumer marketing purposes.”

For instance, more accurate models used by lenders do not include outliers — properties that sold for extremely high or low prices and that consequently would skew the averages and the comparable sales for a particular house, like yours. But models used by consumer websites, such as brokers’ sites and national listing sites, scoop in as much “sold” data as possible when concocting a valuation, because then they can claim to include all available data. That’s true, said Regan, but it’s more accurate to weed out misleading data.

AVMs used by lenders send along “confidence scores” that indicate how firm the estimate is. That is a factor typically not included alongside consumer AVMs, she added.

This is an interesting trade-off. The assumption is the consumer wants to see that all the data is accounted for, which makes it seem that the estimate is more worthwhile. More data = more accuracy. On the other hand, those that work with data know that measures of central tendency and variability can be thrown off by unusual cases, often known as outliers. If the value of a home is too high or too low, and there are many reasons why this could be the case, the rest of the data can be thrown off. If there are significant outliers, more data does not equal more accuracy.

Since this knowledge is out there (at least printed in a major newspaper), does this mean consumers will be informed of these algorithm features when they look at websites like Zillow? I imagine it could be tricky to easily explain how removing some of the housing comparison data is actually a good thing but if the long-term goal is better numeracy for the public, this could be a good addition to such websites.

Countering gerrymandering in Pennsylvania with numerical models

Wired highlights a few academics who argued against gerrymandered political districts in Pennsylvania with models showing the low probability that the map is nonpartisan:

Then, Pegden analyzed the partisan slant of each new map compared to the original, using a well-known metric called the median versus mean test. In this case, Pegden compared the Republican vote share in each of Pennsylvania’s 18 districts. For each map, he calculated the difference between the median vote share across all the districts and the mean vote share across all of the districts. The bigger the difference, the more of an advantage the Republicans had in that map.

After conducting his trillion simulations, Pegden found that the 2011 Pennsylvania map exhibited more partisan bias than 99.999999 percent of maps he tested. In other words, making even the tiniest changes in almost any direction to the existing map chiseled away at the Republican advantage…

Like Pegden, Chen uses computer programs to simulate alternative maps. But instead of starting with the original map and making small changes, Chen’s program develops entirely new maps, based on a series of geographic constraints. The maps should be compact in shape, preserve county and municipal boundaries, and have equal populations. They’re drawn, in other words, in some magical world where partisanship doesn’t exist. The only goal, says Chen, is that these maps be “geographically normal.”

Chen generated 500 such maps for Pennsylvania, and analyzed each of them based on how many Republican seats they would yield. He also looked at how many counties and municipalities were split across districts, a practice the Pennsylvania constitution forbids “unless absolutely necessary.” Keeping counties and municipalities together, the thinking goes, keeps communities together. He compared those figures to the disputed map, and presented the results to the court…

Most of the maps gave Republicans nine seats. Just two percent gave them 10 seats. None even came close to the disputed map, which gives Republicans a whopping 13 seats.

It takes a lot of work to develop these models and they are based on particular assumptions as well as methods for calculations. Still, could a political side present a reasonable statistical counterargument?

Given both the innumeracy of the American population and some resistance to experts, I wonder how the public would view such models. On one hand, gerrymandering can be countered by simple arguments: the shapes drawn on the map are pretty strange and can’t truly represent any meaningful community. On the other hand, the models reinforce how unlikely these particular maps are. It isn’t just that the shapes are unusual; they are highly unlikely given various inputs that go into creating meaningful districts. Perhaps any of these argument are meaningless if your side is winning through the maps.

Recommendations to help with SCOTUS’ innumeracy

In the wake of recent comments about “sociological gobbledygook” and measures of gerrymandering, here are some suggestions for how the Supreme Court can better use statistical evidence:

McGhee, who helped develop the efficiency gap measure, wondered if the court should hire a trusted staff of social scientists to help the justices parse empirical arguments. Levinson, the Texas professor, felt that the problem was a lack of rigorous empirical training at most elite law schools, so the long-term solution would be a change in curriculum. Enos and his coauthors proposed “that courts alter their norms and standards regarding the consideration of statistical evidence”; judges are free to ignore statistical evidence, so perhaps nothing will change unless they take this category of evidence more seriously.

But maybe this allergy to statistical evidence is really a smoke screen — a convenient way to make a decision based on ideology while couching it in terms of practicality.

“I don’t put much stock in the claim that the Supreme Court is afraid of adjudicating partisan gerrymanders because it’s afraid of math,” Daniel Hemel, who teaches law at the University of Chicago, told me. “[Roberts] is very smart and so are the judges who would be adjudicating partisan gerrymandering claims — I’m sure he and they could wrap their minds around the math. The ‘gobbledygook’ argument seems to be masking whatever his real objection might be.”

If there is indeed innumeracy present, the justices would not be alone in this. Many Americans do not receive an education in statistics, let alone have enough training to make sense of the statistics regularly used in academic studies.

At the same time, we might go further than the argument made above: should judges make decisions based on statistics (roughly facts) more than ideology or arguments (roughly interpretation)? Again, many Americans struggle with this: there can be broad empirical patterns or even correlations but some would insist that their own personal experiences do not match these. Should judicial decisions be guided by principles and existing case law or by current statistical realities? The courts are not the only social spheres that struggle with this.

“The most misleading charts of 2015, fixed”

Here are some improved charts first put forward by politicians, advocacy groups, and the media in 2015.

I’m not sure exactly how they picked “the most misleading charts” (is there bias in this selection?) but it is interesting that several involve a misleading y-axis. I’m not sure that I would count the last example as a misleading chart since it involves a definition issue before getting to the chart.

And what is the purpose of the original, poorly done graphics? Changing the presentation of the data provides evidence for a particular viewpoint. Change the graphic depiction of the data and another story could be told. Unfortunately, it is actions like these that tend to cast doubt on the use of data for making public arguments – the data is simply too easy to manipulate so why rely on data at all? Of course, that assumes people look closely at the chart and the data source and know what questions to ask…

“Pollsters defend craft amid string of high-profile misses”

Researchers and polling organizations continue to defend their efforts:

Pollsters widely acknowledge the challenges and limitations taxing their craft. The universality of cellphones, the prevalence of the Internet and a growing reluctance among voters to respond to questions are “huge issues” confronting the field, said Ashley Koning, assistant director at Rutgers University’s Eagleton Center for Public Interest Polling…

“Not every poll,” Koning added, “is a poll worth reading.”

Scott Keeter, director of survey research at the Pew Research Center, agreed. Placing too much trust in early surveys, when few voters are paying close attention and the candidate pools are their largest, “is asking more of a poll than what it can really do.”…

Kathryn Bowman, a public opinion specialist at the American Enterprise Institute, also downplayed the importance of early primary polls, saying they have “very little predictive value at this stage of the campaign.” Still, she said, the blame is widespread, lamenting the rise of pollsters who prioritize close races to gain coverage, journalists too eager to cover those results and news consumers who flock to those types of stories.

Given the reliance on data in today’s world, particularly in political campaigns, polls are unlikely to go away. But, there will be likely be changes in the future that might include:

  1. More consumers of polls, the media and potential voters, learn what exactly polls are saying and what they are not. Since the media seems to love polls and horse races, I’m not sure much will change in that realm. But, we need great numeracy among Americans to sort through all of these numbers.
  2. Continued efforts to improve methodology when it is harder to reach people and obtain representative samples and predict who will be voting.
  3. A consolidation of efforts by researchers and poling organizations as (a) some are knocked out by a string of bad results or high-profile wrong predictions and (b) groups try to pool their resources (money, knowledge, data) to improve their accuracy. Or, perhaps (c) polling will just become a partisan effort as more objective observers realize their efforts won’t be used correctly (see #1 above).

Can religion not be fully studied with surveys or do we not use survey results well?

In a new book (which I have not read), sociologist Robert Wuthnow critiques the use of survey data to explain American religion:

Bad stats are easy targets, though. Setting these aside, it’s much more difficult to wage a sustained critique of polling. Enter Robert Wuthnow, a Princeton professor whose new book, Inventing American Religion, takes on the entire industry with the kind of telegraphed crankiness only academics can achieve. He argues that even gold-standard contemporary polling relies on flawed methodologies and biased questions. Polls about religion claim to show what Americans believe as a society, but actually, Wuthnow says, they say very little…

Even polling that wasn’t bought by evangelical Christians tended to focus on white, evangelical Protestants, Wuthnow writes. This trend continues today, especially in poll questions that treat the public practice of religion as separate from private belief. As the University of North Carolina professor Molly Worthen wrote in a 2012 column for The New York Times, “The very idea that it is possible to cordon off personal religious beliefs from a secular town square depends on Protestant assumptions about what counts as ‘religion,’ even if we now mask these sectarian foundations with labels like ‘Judeo-Christian.’”…

These standards are largely what Wuthnow’s book is concerned with: specifically, declining rates of responses to almost all polls; the short amount of time pollsters spend administering questionnaires; the racial and denominational biases embedded in the way most religion polls are framed; and the inundation of polls and polling information in public life. To him, there’s a lot more depth to be drawn from qualitative interviews than quantitative studies. “Talking to people at length in their own words, we learn that [religion] is quite personal and quite variable and rooted in the narratives of personal experience,” he said in an interview…

In interviews, people rarely frame their own religious experiences in terms of statistics and how they compare to trends around the country, Wuthnow said. They speak “more about the demarcations in their own personal biographies. It was something they were raised with, or something that affected who they married, or something that’s affecting how they’re raising their children.”

I suspect such critiques could be leveled at much of survey research: the questions can be simplistic, the askers of the questions can have a variety of motives and skills in developing useful survey questions, and the data gets bandied about in the media and public. Can surveys alone adequately address race, cultural values, politics views and behaviors, and more? That said, I’m sure there are specific issues with surveys regarding religion that should be addressed.

I wonder, though , if another important issue here is whether the public and the media know what to do with survey results. This book review suggests people take survey findings as gospel. They don’t know about the nuances of surveys or how to look at multiple survey questions or surveys that get at similar topics. Media reports on this data are often simplistic and lead with a “shocking” piece of information or some important trend (even if the data suggests continuity). While more social science projects on religion could benefit from mixed methods or by incorporating data from the other side (whether quantitative or qualitative), the public knows even less about these options or how to compare data. In other words, surveys always have issues but people are generally innumerate in knowing what to do with the findings.