Models are models, not perfect predictions

One academic summarizes how we should read and interpret COVID-19 models:

Every time the White House releases a COVID-19 model, we will be tempted to drown ourselves in endless discussions about the error bars, the clarity around the parameters, the wide range of outcomes, and the applicability of the underlying data. And the media might be tempted to cover those discussions, as this fits their horse-race, he-said-she-said scripts. Let’s not. We should instead look at the calamitous branches of our decision tree and chop them all off, and then chop them off again.

Sometimes, when we succeed in chopping off the end of the pessimistic tail, it looks like we overreacted. A near miss can make a model look false. But that’s not always what happened. It just means we won. And that’s why we model.

Five quick thoughts in response:

  1. I would be tempted to say that the perilous times of COVID-19 lead more people to see models as certainty but I have seen this issue plenty of times in more “normal” periods.
  2. It would help if the media had less innumeracy and more knowledge of how science, natural and social, works. I know the media leans towards answers and sure headlines but science is often messier and takes time to reach consensus.
  3. Making models that include social behavior is difficult. This particular phenomena has both a physical and social component. Viruses act in certain ways. Humans act in somewhat predictable ways. Both can change.
  4. Models involve data and assumptions. Sometimes, the model might fit reality. At other times, models do not fit. Either way, researchers are looking to refine their models so that we better understand how the world works. In this case, perhaps models can become better on the fly as more data comes in and/or certain patterns are established.
  5. Predictions or proof can be difficult to come by with models. The language of “proof” is one we often use in regular conversation but is unrealistic in numerous academic settings. Instead, we might talk about higher or lower likelihoods or provide the best possible estimate and the margins of error.

Using and interpreting alternative data sources to examine COVID-19 impact

In a world full of data, businesses, investors, and others have access to newer sources of information that can provide insights into responses to COVID-19:

For instance, Angus says that monitoring China’s internet throughout the pandemic showed how industrial plants in the worst-affected regions—which operate servers and computers—shut down during the outbreak. In the last few weeks, as the emergency abated, things have started crawling back to normalcy, even if we are still far from pre-Covid-19 levels, and the evidence might be polluted by plants being restarted just to hit government-imposed power consumption targets. “China is not normal yet,” Angus says. The country’s internet latency suggests that “recovery is happening in China, but there are still a lot of people who must be facing at-home-life for their activities.”…

Combining data from vessel transponders with satellite images, he has periodically checked how many oil tankers are in anchorage in China, unable to deliver their cargo—an intimation both of how well China’s ports are functioning amid the pandemic, and of how well industrial production is keeping up.

Madani also relies on TomTom’s road traffic data for various Chinese and Italian cities to understand how they are affected by quarantines and movement restrictions. “What we’ve seen over the past two weeks is a big revival in congestion,” he says. “There’s more traffic going on now in China, in the big cities, apart from Wuhan.”…

Pollution data is another valuable source of information. Over the past weeks, people on Twitter have been sharing satellite images of various countries, showing that pollution levels are dropping across the industrialised world as a result of coronavirus-induced lockdowns. But where working-from-home twitteratis see a poetic silver lining, Madani sees cold facts about oil consumption.

Three quick thoughts:

1. Even with all of this data, interpreting it is still an important task. People could look at similar data and come to similar conclusions. Or, they might have access to one set of data and not another piece and then draw different conclusions. This becomes critical when people today want data-driven responses or want to back up their position with data. Simply having data is not enough.

2. There is publicly available data – with lots of charts and graphs going around in the United States about cases – and then there is data that requires subscriptions, connections, insider information. Who has access to what data still matters.

3. We have more data than ever before and yet this does not necessarily translate into less anxiety or more preparation regarding certain occurrences. Indeed, more information might make things worse for some.

In sum, we can know more about the world than ever before but we are still working on ways to utilize and comprehend that information that might have been unthinkable decades ago.

Changing the Y-axis scale across graphs – to good effect

In a look at COVID-19 cases across countries, the New York Times changed the Y-axis on the different graphs:

COVID19CurvesAcrossCountries

Typically, readers of graphs should beware when someone changes the scale on the Y-axis; this leads to issues when interpreting the data and can make it look like trends are present when they are not. See two earlier posts – misleading charts of 2015, State of the Union data presented in 2013 – for examples.

But, in this case, adjusting the scale makes some sense. The goal is to show exponential curves, the type of change when a disease spreads throughout a population, and then hopefully a peak and decline on the right side. Some countries have very few cases – such as toward the bottom like in Morocco or Hungary or Mexico – and some have many more – like Italy or South Korea – but the general shape can be similar. Once the rise starts, it is expected to continue until something stops it. And the pattern can look similar across countries.

Also, it is helpful that the creators of this point out at the top that “Scales are adjusted in each country to make the curve more readable.” It is not always reported when Y-axes are altered – and this lack of communication could be intentional – and then readers might not pick up on the issue.

A (real) pie chart to effectively illustrate wealth inequality

Pie graphs can be great at showing relative differences between a small number of categories. A recent example of this comes from CBS:

CBS This Morning co-host Tony Dokoupil set up a table at a mall in West Nyack, New York, with a pie that represented $98 trillion of household wealth in the United States. The pie was sliced into 10 pieces and Dokoupil asked people to divide up those pieces onto five plates representing the poorest, the lower middle class, middle class, upper middle class, and wealthiest Americans. No one got it right. And, in fact, no one was even kind of close to estimating the real ratio, which involves giving nine pieces to the top 20 percent of Americans while the upper middle class and the middle class share one piece between the two of them. The lower middle class would effectively get crumbs considering they only have 0.3 percent of the pie. What about the poorest Americans? They wouldn’t get any pie at all, and in fact would get a bill, considering they are, on average, around $6,000 in debt…

To illustrate just how concentrated wealth is in the country, Dokoupil went on to note that if just the top 1 percent are taken into account, they would get four of the nine pieces of pie that go to the wealthiest Americans.

A pie chart sounds like a great device for this situation because of several features of the data and the presentation:

1. There are five categories of social class. Not too many for a pie chart.

2. One of those categories, the top 20 of Americans, clearly has a bigger portion of the pie than the other groups. A pie chart is well-suited to show one dominant category compared to the others.

3. Visitors to a shopping mall can easily understand a pie chart. They understand how it works and what it says (particularly with #1 and #2 above).

Together, a pie chart works in ways that other graphs and charts would not.

(Side note: it is hard to know whether the use of food in the pie chart helped or hurt the presentation. Do people work better with data when feeling hungry?)

“98 opioid-related deaths last year in DuPage” and local decisions

As Itasca leaders and residents debate a proposal for a drug-treatment facility in the suburb, an update on the story included this statistic:

There were 98 opioid-related deaths last year in DuPage.

Illinois appeared to be in the middle of states with its rate of opioid deaths in 2017 (see the data here). DuPage County has a lot of residents – over 928,000 according to 2018 estimates – and the Coroner has all the statistics on deaths in 2018.

In the debates over whether suburbs should be home to drug treatment facilities, such statistics could matter. Are 98 deaths enough to (a) declare that this is an issue worth addressing and (b) suburbs should welcome facilities that could help address the problems. Both issues could be up for debate though I suspect the real issue is the second one: even if suburbanites recognize that opioid-related deaths are a social problem, that does not necessarily mean they are willing to live near such a facility.

Does this mean that statistics are worthless in such a public discussion? Not necessarily, though statistics alone may not be enough to convince a suburban resident one way or another about supporting change in their community. If residents believe strongly that such a medical facility is detrimental to their suburb, often invoking the character of the community, local resources, and property values, no combination of numbers and narratives might overwhelm what is perceived as a big threat. On the other hand, public discussions of land use and zoning can evolve and opposition or support can shift.

17% of millennial homebuyers regret the purchase (but perhaps 83% do not??)

A recent headline: “17% of young homebuyers regret their purchase, Zillow survey shows.” And two opening paragraphs:

Seventeen percent of millennial and Generation Z homebuyers from ages 18-34  regret purchasing a home instead of renting, according to a Zillow survey.

Speculating as to why, Josh Lehr, industry development at Zillow-owned Mortech, said getting the wrong mortgage may have driven that disappointment. For example, the Zillow survey showed 22% of young buyers had regrets about their type of mortgage and 27-30% said their rates and payments are too high.

The rest of the short article then goes on to talk about the difficulties millennials might face in going through the mortgage process. Indeed, it seems consumer generally dislike obtaining a mortgage.

But, the headline is an odd one. Why focus on the 17% that have some regret about their purchase? Is that number high or low compared to regret after other major purchases (such as taking on a car loan)?

If the number is accurate, why not discuss the 83% of millennials who did not regret their purchase? Are there different reasons for choosing which number to highlight (even when both numbers are true)?

And is the number what the headline makes it out to be? The paragraph cited above suggests the question from Zillow might be less about regret in purchasing a home versus regret about owning rather than renting. Then, perhaps this is less about the specific home or mortgage and more about having the flexibility of renting or other amenities renting provides.

In sum, this headline could be better. Interpreting the original Zillow data could be better. Just another reminder that statistics do not interpret themselves…

The modal age of racial/ethnic groups in the United States

There is a big age difference in the most common age among racial and ethnic groups in the United States – particularly compared to the median.

In U.S., most common age for whites is much older than for minorities

 

 

 

 

There were more 27-year-olds in the United States than people of any other age in 2018. But for white Americans, the most common age was 58, according to a Pew Research Center analysis of Census Bureau data.

In the histogram above, which shows the total number of Americans of each age last year, non-Hispanic whites tend to skew toward the older end of the spectrum (more to the right), while racial and ethnic minority groups – who include everyone except single-race non-Hispanic whites – skew younger (more to the left).

The most common age was 11 for Hispanics, 27 for blacks and 29 for Asians as of last July, the latest estimates available. Americans of two or more races were by far the youngest racial or ethnic group in the Census Bureau data, with a most common age of just 3 years old. Among all racial and ethnic minorities, the most common age was 27…

Non-Hispanic whites constituted a majority (60%) of the U.S. population in 2018, and they were also the oldest of any racial or ethnic group as measured by median age – a different statistic than most common age (mode). Whites had a median age of 44, meaning that if you lined up all whites in the U.S. from youngest to oldest, the person in the middle would be 44 years old. This compares with a median age of just 31 for minorities and 38 for the U.S. population overall.

The paragraphs above provide multiple pieces of information that explain the distribution displayed above:

-The different groups have different skews, suggesting these are not even distributions.

-The mode is much higher for whites.

-The median agrees with the conclusion from the mode – whites are on average older – but the gap between whites and other groups drops.

All three pieces of information could inform the headline but Pew chose to go with the mode. Is this with the intent of suggesting large age differences among the groups?