“98 opioid-related deaths last year in DuPage” and local decisions

As Itasca leaders and residents debate a proposal for a drug-treatment facility in the suburb, an update on the story included this statistic:

There were 98 opioid-related deaths last year in DuPage.

Illinois appeared to be in the middle of states with its rate of opioid deaths in 2017 (see the data here). DuPage County has a lot of residents – over 928,000 according to 2018 estimates – and the Coroner has all the statistics on deaths in 2018.

In the debates over whether suburbs should be home to drug treatment facilities, such statistics could matter. Are 98 deaths enough to (a) declare that this is an issue worth addressing and (b) suburbs should welcome facilities that could help address the problems. Both issues could be up for debate though I suspect the real issue is the second one: even if suburbanites recognize that opioid-related deaths are a social problem, that does not necessarily mean they are willing to live near such a facility.

Does this mean that statistics are worthless in such a public discussion? Not necessarily, though statistics alone may not be enough to convince a suburban resident one way or another about supporting change in their community. If residents believe strongly that such a medical facility is detrimental to their suburb, often invoking the character of the community, local resources, and property values, no combination of numbers and narratives might overwhelm what is perceived as a big threat. On the other hand, public discussions of land use and zoning can evolve and opposition or support can shift.

17% of millennial homebuyers regret the purchase (but perhaps 83% do not??)

A recent headline: “17% of young homebuyers regret their purchase, Zillow survey shows.” And two opening paragraphs:

Seventeen percent of millennial and Generation Z homebuyers from ages 18-34  regret purchasing a home instead of renting, according to a Zillow survey.

Speculating as to why, Josh Lehr, industry development at Zillow-owned Mortech, said getting the wrong mortgage may have driven that disappointment. For example, the Zillow survey showed 22% of young buyers had regrets about their type of mortgage and 27-30% said their rates and payments are too high.

The rest of the short article then goes on to talk about the difficulties millennials might face in going through the mortgage process. Indeed, it seems consumer generally dislike obtaining a mortgage.

But, the headline is an odd one. Why focus on the 17% that have some regret about their purchase? Is that number high or low compared to regret after other major purchases (such as taking on a car loan)?

If the number is accurate, why not discuss the 83% of millennials who did not regret their purchase? Are there different reasons for choosing which number to highlight (even when both numbers are true)?

And is the number what the headline makes it out to be? The paragraph cited above suggests the question from Zillow might be less about regret in purchasing a home versus regret about owning rather than renting. Then, perhaps this is less about the specific home or mortgage and more about having the flexibility of renting or other amenities renting provides.

In sum, this headline could be better. Interpreting the original Zillow data could be better. Just another reminder that statistics do not interpret themselves…

The modal age of racial/ethnic groups in the United States

There is a big age difference in the most common age among racial and ethnic groups in the United States – particularly compared to the median.

In U.S., most common age for whites is much older than for minorities

 

 

 

 

There were more 27-year-olds in the United States than people of any other age in 2018. But for white Americans, the most common age was 58, according to a Pew Research Center analysis of Census Bureau data.

In the histogram above, which shows the total number of Americans of each age last year, non-Hispanic whites tend to skew toward the older end of the spectrum (more to the right), while racial and ethnic minority groups – who include everyone except single-race non-Hispanic whites – skew younger (more to the left).

The most common age was 11 for Hispanics, 27 for blacks and 29 for Asians as of last July, the latest estimates available. Americans of two or more races were by far the youngest racial or ethnic group in the Census Bureau data, with a most common age of just 3 years old. Among all racial and ethnic minorities, the most common age was 27…

Non-Hispanic whites constituted a majority (60%) of the U.S. population in 2018, and they were also the oldest of any racial or ethnic group as measured by median age – a different statistic than most common age (mode). Whites had a median age of 44, meaning that if you lined up all whites in the U.S. from youngest to oldest, the person in the middle would be 44 years old. This compares with a median age of just 31 for minorities and 38 for the U.S. population overall.

The paragraphs above provide multiple pieces of information that explain the distribution displayed above:

-The different groups have different skews, suggesting these are not even distributions.

-The mode is much higher for whites.

-The median agrees with the conclusion from the mode – whites are on average older – but the gap between whites and other groups drops.

All three pieces of information could inform the headline but Pew chose to go with the mode. Is this with the intent of suggesting large age differences among the groups?

Non-fiction books can have limited fact-checking, no peer review

An example of a significant misinterpretation of survey data in a recent book provides a reminder of about reading “facts”:

There are a few major lessons here. The first is that books are not subject to peer review, and in the typical case not even subject to fact-checking by the publishers — often they put responsibility for fact-checking on the authors, who may vary in how thoroughly they conduct such fact-checks and in whether they have the expertise to notice errors in interpreting studies, like Wolf’s or Dolan’s.

The second, Kimbrough told me, is that in many respects we got lucky in the Dolan case. Dolan was using publicly available data, which meant that when Kimbrough doubted his claims, he could look up the original data himself and check Dolan’s work. “It’s good this work was done using public data,” Kimbrough told me, “so I’m able to go pull the data and look into it and see, ‘Oh, this is clearly wrong.’”…

Book-publishing culture similarly needs to change to address that first problem. Books often go to print with less fact-checking than an average Vox article, and at hundreds of pages long, that almost always means several errors. The recent high-profile cases where these errors have been serious, embarrassing, and highly public might create enough pressure to finally change that.

In the meantime, don’t trust shocking claims with a single source, even if they’re from a well-regarded expert. It’s all too easy to misread a study, and all too easy for those errors to make it all the way to print.

These are good steps, particularly the last paragraph above: shocking or even surprising statistics are worth checking against the data or against other sources to verify. After all, it is not that hard for a mutant statistic to spread.

Unfortunately, correctly interpreting data continues to get pushed down the chain to readers and consumers. When I read articles or books in 2019, I need to be fairly skeptical of what I am reading. This is hard to do with (1) the glut of information we all face (so many sources!) and (2) needing to know how to be skeptical of information. This is why it is easy to fall into filtering sources of information into camps of sources we trust versus ones we do not. At the same time, knowing how statistics and data works goes a long way in questioning information. In the main example in the story above, the interpretation issue came down to how the survey questions were asked. An average consumer of the book may have little idea to question the survey data collection process, let alone the veracity of the claim. It took an academic who works with the same dataset to question the interpretation.

To do this individual fact-checking better (and to do it better at a structural level before books are published), we need to combat innumeracy. Readers need to be able to understand data: how it is collected, how it is interpreted, and how it ends up in print or in the public arena. This usually does not require a deep knowledge of particular methods but it does require some familiarity with how data becomes data. Similarly, being cynical about all data and statistics is not the answer; readers need to know when data is good enough.

Mutant statistic: marketing, health, and 10,000 steps a day

A recent study suggests the 10,000 steps a day for better health advice may not be based in research:

I-Min Lee, a professor of epidemiology at the Harvard University T. H. Chan School of Public Health and the lead author of a new study published this week in the Journal of the American Medical Association, began looking into the step rule because she was curious about where it came from. “It turns out the original basis for this 10,000-step guideline was really a marketing strategy,” she explains. “In 1965, a Japanese company was selling pedometers, and they gave it a name that, in Japanese, means ‘the 10,000-step meter.’”

Based on conversations she’s had with Japanese researchers, Lee believes that name was chosen for the product because the character for “10,000” looks sort of like a man walking. As far as she knows, the actual health merits of that number have never been validated by research.

Scientific or not, this bit of branding ingenuity transmogrified into a pearl of wisdom that traveled around the globe over the next half century, and eventually found its way onto the wrists and into the pockets of millions of Americans. In her research, Lee put it to the test by observing the step totals and mortality rates of more than 16,000 elderly American women. The study’s results paint a more nuanced picture of the value of physical activity.

“The basic finding was that at 4,400 steps per day, these women had significantly lower mortality rates compared to the least active women,” Lee explains. If they did more, their mortality rates continued to drop, until they reached about 7,500 steps, at which point the rates leveled out. Ultimately, increasing daily physical activity by as little as 2,000 steps—less than a mile of walking—was associated with positive health outcomes for the elderly women.

This sounds like a “mutant statistic” like sociologist Joel Best describes. The study suggests the figure originally arose for marketing purposes and was less about the actual numeric quantity and more about a particular cultural reference. From there, the figure spread until it became a normal part of cultural life and organizational behavior as people and groups aimed to walk 10,000 steps. Few people likely stopped to think about whether 10,000 was an accurate figure or an empirical finding. As a marketing ploy, it seems to have worked.

This should raise larger questions about how many other publicly known figures are more fabrication than empirically based. Do these figures tend to pop up in health statistics more than in other fields? Does countering the figures with an academic study stem the tide of their usage?

 

The correct interpretation of the concept of a 500 or 1,000 year flood

A flood expert addresses five myths about floods and this includes the idea that a 500 year flood only happens once every 500 years:

Myth No. 3
A “100-year flood” is a historic, once-in-a-century disaster.

Describing floods in terms of “100-year,” “500-year” and “1,000-year” often makes people think the disaster was the most severe to occur in that time frame — as encapsulated by President Trump’s tweet calling Harvey a “once in 500 year flood!” He’s not alone. When researchers from the University of California at Berkeley surveyed residents in Stockton, Calif., about their perceived flood risk, they found that although 34 percent claimed familiarity with the term “100-year flood,” only 2.6 percent defined it correctly. The most common responses were some variation of “A major flood comes every 100 years — it’s a worst-case scenario’’ and ‘‘According to history, every 100 years or so, major flooding has occurred in the area and through documented history, they can predict or hypothesize on what to expect and plan accordingly and hopefully correct.”

In fact, the metric communicates the flood risk of a given area : A home in a 100-year flood plain has a 1 percent chance of flooding in a given year. In 2018, Ellicott City, Md., experienced its second 1,000-year flood in two years, and with Harvey, Houston faced its third 500-year flood in three years.

That risk constantly changes, because of factors such as the natural movement of rivers, the development of new parcels of land, and climate change’s influence on rainfall, snowmelt, storm surges and sea level. “Because of all the uncertainty, a flood that has a 1 percent annual risk of happening has a high water mark that is best described as a range, not a single maximum point,” according to FiveThirtyEight.

I am not surprised that the majority of respondents in the cited survey got this wrong because I have never heard it explained this way. Either way, the general idea still seems to hold: the major flooding/storm/disaster is relatively rare and the probability is low in a given year that the major problem will occur.

Of course, that does not mean that there is no risk or that residents couldn’t experience multiple occurrences within a short time period (though this is predicted to be rare). Low risk events seem to flummox people when they do actually happen. Furthermore, as noted above, conditions can change and the same storms can create more damage depending on development changes.

So if this commonly used way of discussing risk and occurrences of natural disasters is not effective, what would better communicate the idea to local residents and leaders? Would it be better to provide the percent risk of flooding each year?

Significant vs. substantive differences, urban vs. suburban snow totals edition

Meteorologist Tom Skilling discusses the difference in snowfall between urban and suburban parts of the Chicago region. In doing so, he illustrates the differences between significant and substantive significance:

Dear Tom,
Why do Chicago’s suburbs get more snow in the winter than Chicago itself?
— Matt, Palatine

Dear Matt,
I do not believe that to be the case. For example, the annual snowfall at Midway Airport is 39.3 inches (Midway being closer to the lake than O’Hare); at O’Hare International Airport, it’s 37.6 inches; at Rockford, 38.3 inches. The differences aren’t large, but they are significant nonetheless. Lake Michigan enhancement of snowfall totals and the occurrence of lake-effect snows in locations closer to the lake all argue that more snow will fall with some regularity at lakeside locations.
Please note that these are generalized statements. Individual snow events will not necessarily conform to the “more snow near the lake” phenomenon. However, averaged over a period of many years, lakeside locations receive more snow than inland locations.

Because the weather data is based on decades of data, we can be fairly confident that there is a difference in snowfall between the three locations mentioned. The location nearest the lake, Midway, receives more snow, Rockford, furthest from the lake, receives a little less snow, and O’Hare, in between though much closer to the lake than Rockford, is in the middle.

On the other hand, there is very little substantive difference between these totals. Over the course of an entire year, the spread between the averages of the three locations is only 1.7 inches total. That is not much. It is likely not noticeable to the average resident. Similarly, I can’t imagine municipalities act much differently because of less than two inches of snow spread out over a year.

This illustrates an issue that often arises in doing statistical analysis: a statistical test may show a significant difference or relationship in the population but the actual difference or relationship is hard to notice or not worth acting on. Here, the data shows real differences in snowfall across locations but the real-world effect is limited.