More on modeling uncertainty and approaching model results

People around the world want answers about the spread of COVID-19. Models offer data-driven certainties, right?

The only problem with this bit of relatively good news? It’s almost certainly wrong. All models are wrong. Some are just less wrong than others — and those are the ones that public health officials rely on…

The latest calculations are based on better data on how the virus acts, more information on how people act and more cities as examples. For example, new data from Italy and Spain suggest social distancing is working even better than expected to stop the spread of the virus…

Squeeze all those thousands of data points into incredibly complex mathematical equations and voila, here’s what’s going to happen next with the pandemic. Except, remember, there’s a huge margin of error: For the prediction of U.S. deaths, the range is larger than the population of Wilmington, Delaware.

“No model is perfect, but most models are somewhat useful,” said John Allen Paulos, a professor of math at Temple University and author of several books about math and everyday life. “But we can’t confuse the model with reality.”…

Because of the large fudge factor, it’s smart not to look at one single number — the minimum number of deaths, or the maximum for that matter — but instead at the range of confidence, where there’s a 95% chance reality will fall, mathematician Paulos said. For the University of Washington model, that’s from 50,000 to 136,000 deaths.

Models depend on the data available, the assumptions made by researchers, the equations utilized, and then there is a social component where people (ranging from academics to residents to leaders to the media) interact with the results of the model.

This reminds me of sociologist Joel Best’s argument regarding how people should view statistics and data. One option is to be cynical about all data. The models are rarely right on so why trust any numbers? Better to go with other kinds of evidence. Another option is to naively accept models and numbers. They have the weight of math, science, and research. They are complicated and should simply be trusted. Best proposes a third option between these two extremes: a critical approach. Armed with some good questions (what data are the researchers working with? what assumptions did they make? what do the statistics/model actually say?), a reader of models and data analysis can start to evaluate the results. Models cannot do everything – but they can do something.

(Also see a post last week about models and what they can offer during a pandemic.)

Middle-class incomes have biggest year to year rise – with a catch

New data suggests middle-class incomes rose in 2015:

The incomes of typical Americans rose in 2015 by 5.2 percent, the first significant boost to middle-class pay since the end of the Great Recession and the fastest increase ever recorded by the federal government, the Census Bureau reported Tuesday.

In addition, the poverty rate fell by 1.2 percentage points, the steepest decline since 1968. There were 43.1 million Americans in poverty on the year, 3.5 million fewer than in 2014…

The 5.2 percent increase was the largest, in percentage terms, ever recorded by the bureau since it began tracking median income statistics in the 1960s. Bureau officials said it was not statistically distinguishable from five other previous increases in the data, most recently the 3.7 percent jump from 1997 to 1998.

Rising incomes are generally good. But, note the catch in the third paragraph cited above: officials cannot say that the 5.2% increase is definitively higher than several previous increases. Why not? The 5.2% figure is based on a sample that has a margin of error of at least 1.5% either way. The data comes from these Census instruments:

The Current Population Survey Annual Social and Economic Supplement was conducted nationwide and collected information about income and health insurance coverage during the 2015 calendar year. The Current Population Survey, sponsored jointly by the U.S. Census Bureau and U.S. Bureau of Labor Statistics, is conducted every month and is the primary source of labor force statistics for the U.S. population; it is used to calculate the monthly unemployment rate estimates. Supplements are added in most months; the Annual Social and Economic Supplement questionnaire is designed to give annual, national estimates of income, poverty and health insurance numbers and rates.

According to the report (page 6), the margin of error for the percent change in income from 2014 to 2015 is 1.6%. Incomes may have risen even more than 5.2%! Or, they may have risen at lower rates. See the methodological document regarding the survey instruments here.

The Census has in recent years moved to more frequent reports on key demographic measures. This produces data more frequently. One of the trade-offs, however, is that these estimates are not as accurate as the dicennial census which requires a lot more resources to conduct and is more thorough.

A final note: it is good that the margin of error is hinted at in the article on rising middle-class incomes. On the other hand, it is mentioned in paragraph 12 and the headline clearly suggests that this was a record year. Statistically speaking, this may or may not be the case.

Chicago’s loss of nearly 3,000 residents in 2015 is an estimate

Chicago media were all over the story this week that Chicago was the only major American city to lose residents in 2015. The Chicago Tribune summed it up this way:

This city has distinguished itself as the only one among the nation’s 20 largest to actually lose population in the 12-month stretch that ended June 30.

Almost 3,000 fewer people live here compared with a year earlier, according to new figures from the U.S. Census Bureau, while there’s been a decline of more than 6,000 residents across the larger metropolitan area.

Chicago’s decline is a mere 0.1 percent, which is practically flat. But cities are like corporations in that even slow growth wins more investor confidence than no growth, and losses are no good at all.

The last paragraph cited above is a good one; 3,000 people either way is not very many and this is all about perceptions.

But, there is a larger issue at stake. These population figures are estimates. Estimates. They are not exact. In other words, the Census Bureau doesn’t measure every person moving in or leaving for good. They do the best the can with the data they have to work with.

For example, on May 19 the Census released the list of the fastest growing cities in America. Here is what they say about the population figures:

To produce population estimates for cities and towns, the Census Bureau first generates county population estimates using a component of population change method, which updates the latest census population using data on births, deaths, and domestic and international migration. This yields a county-level total of the population living in households. Next, updated housing unit estimates and rates of overall occupancy are used to distribute county household population into geographic areas within the county. Then, estimates of the population living in group quarters, such as college dormitories and prisons, are added to create estimates of the total resident population.

If you want to read the methodology behind producing the 2015 city population figures, read the two page document here.

So why doesn’t the Census and the media report the margin of error? What exactly is the margin of error? For a city of Chicago’s size – just over 2.7 million – couldn’t a loss of 3,000 residents actually be a small gain in population or a loss double the size? New York’s gain of 55,000 people in 2015 seems pretty sure to be positive regardless of the margin of error. But, small declines – as published here in USA Today – seem a bit misleading:

I know the media and others want hard numbers to work with but it should be made clear that these are the best estimates we can come up with and they may not be exact. I trust the Census Bureau is doing all it can to make such projections – but they are not perfect.

Disagreement on whether there are 7 billion people on earth just yet

There have been a number of recent stories about how the world’s population has reached 7 billion. Interestingly, not everyone agrees that this has happened yet:

According to United Nations demographers, 6,999,999,999 other Earthlings potentially felt the same way on Monday when the world’s population topped seven billion. But if you’d rather go by the United States Census Bureau’s projections, you’ve got some breathing room. The bureau estimates that even with the world’s population increasing by 215,120 a day, it won’t reach seven billion for about four months.

How do the dueling demographic experts reconcile a difference, as of Monday, of 28 million, which is more than all the people in Saudi Arabia?

They don’t.

“No one can know the exact number of people on the globe,” Gerhard Heilig, chief of the population estimates and projections section of the United Nations Population Division, acknowledges.

Even the best individual government censuses have a margin of error of at least 1 percent, he said, which would translate in the global aggregation to “a window of uncertainty of six months before or six months after Oct. 31.” An error margin of even as little as 2 percent would mean that Monday’s estimate of seven billion actually was 56 million off (which is more people than were counted in South Africa).

Figuring this out is not an easy task. It requires a central group to tabulate results from all of the countries around the world. Could there be a difference in the reliability and validity of the results across nations? For example, can we trust population counts from honed operations in the United States and other Western nations more than counts from Third World countries? (I wish the article went into this: how accurate are population figures from different countries? How big might the margins of errors be?) I’ve seen this before when doing some research in graduate school on suicide figures that the United Nations has collected – in the period I was looking at, roughly 1950 to 1970, some countries didn’t report, some had rougher estimates, and countries could have different definitions about what constitutes a suicide. Absolute population counts should be more straight forward but I imagine there could be a number of complications.

Will we get another round of news stories when the Census Bureau says we have hit 7 billion? I wonder if the perceived global authority of the United Nations versus that of the Census Bureau plays a role. For example, did the New York Times report the 7 billion figure as front-page news and then print this caveat story later in the news section?

A final note: the story ends by suggesting the two estimates are not that far off. If we could be so lucky that all of our estimates have only a 1% margin of error, science would benefit greatly. But it is a reminder that official figures are estimates, not 100% counts of social phenomenon.

Poll figures on how the Rapture would have affected the Republican presidential field

Even as the news cycle winds down on Harold Camping and his prediction about the Rapture, Public Policy Polling (PPP) digs through some data to determine how the Rapture would have affected the field of Republican presidential candidates:

First off- no one really believed the Rapture was going to happen last weekend, or at least they won’t admit it. Just 2% of voters say they thought that was coming on Saturday to 98% who say they did not. It’s really close to impossible to ask a question on a poll that only 2% of people say yes to. A national poll we did in September 2009 found that 10% of voters thought Barack Obama was the Anti-Christ, or at least said they thought so. That 2% number is remarkably low.

11% of voters though think the Rapture will occur in their lifetimes, even if it didn’t happen last weekend. 66% think it will not happen and 23% are unsure. If the true believers who think the Rapture will happen in their lifetime are correct- and they’re the ones who had the strongest enough faith to get taken up into heaven- then that’s going to be worth a 2-5 point boost to Obama’s reelection prospects. That’s because while only 6% of independents and 10% of Democrats think the Rapture will happen during their lifetime, 16% of Republicans do. We always talk about demographic change helping Democrats with the rise of the Hispanic vote, but if the Rapture occurs it would be an even more immediate boost to Democratic electoral prospects.

Obama’s lead over Romney is 7 points with all voters, but if you take out the ones who think the Rapture will occur in their lifetime his advantage increases to 9 points. That’s because the Rapture voters support Romney by a 49-35 margin. Against Gingrich Obama’s 14 point lead overall becomes a 17 point one if you take out take the ‘Rapturers’ because they support Gingrich 50-37. And Obama’s 17 point lead over Palin becomes a 22 point spread without those voters because they support Palin 54-37.

Palin is the only person we tested on this poll who is actually popular with people who think the Rapture is going to happen. She has a 53/38 favorability with them, compared to 33/41 for Romney, 26/48 for Gingrich, and a 31/58 approval for Obama. Palin’s problem is that her favorability with everyone who doesn’t think the Rapture will happen is 27/66.

What a great way to combine two of the media’s recent fascinations. I would guess PPP put this poll together solely to take advantage of this news cycle. Should we conclude that Democrats should have wished the Rapture to actually happen to improve their political chances?

Of course, all of this data should be taken with a grain of salt as only 2% of the voters believed the Rapture was going to happen this past weekend and 11% believe it will happen in their lifetimes. These small numbers are out of a total sample of 600 people, meaning that about 12 people thought the Rapture would happen on Saturday and about 66 thought it would happen while they are alive. And this is all with a margin of error of plus or minus 4 percent, suggesting all of these numbers could be really, really small and not generalizable.

Do polls/surveys like these help contribute to giving all polls/surveys a bad reputation?

Defining the Washington elite

Politico is reporting today on an online poll where they compare opinions of the “Washington elite” vs. other Americans. The main news seems to be the divergent opinions between the two groups but the means of measurement is intriguing as well. To qualify as a Washington elite:

[R]espondents must live within the D.C. metro area, earn more than $75,000 per year, have at least a college degree and be involved in the political process or work on key political issues or policy decisions.

Another point of interest: only 227 Washington elite are in the poll. This is a fairly small group for a typical poll to use for analysis. The margin of error for the Washington elites is 6.53%.

Find the story and full polling results here.