Do we know that 500,000 people have fled NYC since the start of COVID-19?

On the heels of much discussion of residents leaving New York City, San Francisco, and other major cities because of COVID-19, the Daily Mail suggests 500,000 people have left New York City:

vehicles on road between high rise buildings

Photo by Craig Adderley on

Parts of Manhattan, famously the ‘city that never sleeps’, have begun to resemble a ghost town since 500,000 mostly wealthy and middle-class residents fled when Covid-19 struck in March.

The number is also part of the headline.

But, how do we know this number is accurate? If there was ever a figure that required some serious triangulation, this could be it. Most of the news stories I have seen on people fleeing cities rely on real estate agents and movers who have close contact with people going from one place to another. Those articles rarely mention figures, settling for vaguer pronouncements about trends or patterns. Better data could come from sources like utility companies (presumably there would be a drop in the consumption of electricity and water), the post office (how many people have changed addresses), and more systematic analyses of real estate records.

A further point about the supposed figure: even if it is accurate, it does not reveal much about long-term trends. Again, the stories on this phenomenon have hinted that some of those people who left will never return while some do want to get back. We will not know until some time has gone by after the COVID-19 pandemic slows down or disappears. Particularly for those with resources, will they sell their New York property or will they sit on it for a while to give themselves options or in order to make sure they get a decent return on it? This may be a shocking figure now but it could turn out in a year or two to mean very little if many of those same people return to the city.

In other words, I would wait to see if this number is trustworthy and if so, what exactly it means in the future. As sociologist Joel Best cautions around numbers that seem shocking, it helps to ask good questions about where the data comes from, how accurate it is, and what it means.

More on modeling uncertainty and approaching model results

People around the world want answers about the spread of COVID-19. Models offer data-driven certainties, right?

The only problem with this bit of relatively good news? It’s almost certainly wrong. All models are wrong. Some are just less wrong than others — and those are the ones that public health officials rely on…

The latest calculations are based on better data on how the virus acts, more information on how people act and more cities as examples. For example, new data from Italy and Spain suggest social distancing is working even better than expected to stop the spread of the virus…

Squeeze all those thousands of data points into incredibly complex mathematical equations and voila, here’s what’s going to happen next with the pandemic. Except, remember, there’s a huge margin of error: For the prediction of U.S. deaths, the range is larger than the population of Wilmington, Delaware.

“No model is perfect, but most models are somewhat useful,” said John Allen Paulos, a professor of math at Temple University and author of several books about math and everyday life. “But we can’t confuse the model with reality.”…

Because of the large fudge factor, it’s smart not to look at one single number — the minimum number of deaths, or the maximum for that matter — but instead at the range of confidence, where there’s a 95% chance reality will fall, mathematician Paulos said. For the University of Washington model, that’s from 50,000 to 136,000 deaths.

Models depend on the data available, the assumptions made by researchers, the equations utilized, and then there is a social component where people (ranging from academics to residents to leaders to the media) interact with the results of the model.

This reminds me of sociologist Joel Best’s argument regarding how people should view statistics and data. One option is to be cynical about all data. The models are rarely right on so why trust any numbers? Better to go with other kinds of evidence. Another option is to naively accept models and numbers. They have the weight of math, science, and research. They are complicated and should simply be trusted. Best proposes a third option between these two extremes: a critical approach. Armed with some good questions (what data are the researchers working with? what assumptions did they make? what do the statistics/model actually say?), a reader of models and data analysis can start to evaluate the results. Models cannot do everything – but they can do something.

(Also see a post last week about models and what they can offer during a pandemic.)

Mutant statistic: marketing, health, and 10,000 steps a day

A recent study suggests the 10,000 steps a day for better health advice may not be based in research:

I-Min Lee, a professor of epidemiology at the Harvard University T. H. Chan School of Public Health and the lead author of a new study published this week in the Journal of the American Medical Association, began looking into the step rule because she was curious about where it came from. “It turns out the original basis for this 10,000-step guideline was really a marketing strategy,” she explains. “In 1965, a Japanese company was selling pedometers, and they gave it a name that, in Japanese, means ‘the 10,000-step meter.’”

Based on conversations she’s had with Japanese researchers, Lee believes that name was chosen for the product because the character for “10,000” looks sort of like a man walking. As far as she knows, the actual health merits of that number have never been validated by research.

Scientific or not, this bit of branding ingenuity transmogrified into a pearl of wisdom that traveled around the globe over the next half century, and eventually found its way onto the wrists and into the pockets of millions of Americans. In her research, Lee put it to the test by observing the step totals and mortality rates of more than 16,000 elderly American women. The study’s results paint a more nuanced picture of the value of physical activity.

“The basic finding was that at 4,400 steps per day, these women had significantly lower mortality rates compared to the least active women,” Lee explains. If they did more, their mortality rates continued to drop, until they reached about 7,500 steps, at which point the rates leveled out. Ultimately, increasing daily physical activity by as little as 2,000 steps—less than a mile of walking—was associated with positive health outcomes for the elderly women.

This sounds like a “mutant statistic” like sociologist Joel Best describes. The study suggests the figure originally arose for marketing purposes and was less about the actual numeric quantity and more about a particular cultural reference. From there, the figure spread until it became a normal part of cultural life and organizational behavior as people and groups aimed to walk 10,000 steps. Few people likely stopped to think about whether 10,000 was an accurate figure or an empirical finding. As a marketing ploy, it seems to have worked.

This should raise larger questions about how many other publicly known figures are more fabrication than empirically based. Do these figures tend to pop up in health statistics more than in other fields? Does countering the figures with an academic study stem the tide of their usage?


Three possible responses to the finding that human behavior is complicated

A review of a new book includes a paragraph (the second one excerpted below) that serves as a good reminder for those interested in human behavior:

What happens in brains and bodies at the moment humans engage in violence with other humans? That is the subject of Stanford University neurobiologist and primatologist Robert M. Sapolsky’s Behave: The Biology of Humans at Our Best and Worst. The book is Sapolsky’s magnum opus, not just in length, scope (nearly every aspect of the human condition is considered), and depth (thousands of references document decades of research by Sapolsky and many others) but also in importance as the acclaimed scientist integrates numerous disciplines to explain both our inner demons and our better angels. It is a magnificent culmination of integrative thinking, on par with similar authoritative works, such as Jared Diamond’s Guns, Germs, and Steel and Steven Pinker’s The Better Angels of Our Nature. Its length and detail are daunting, but Sapolsky’s engaging style—honed through decades of writing editorials, review essays, and columns for The Wall Street Journal, as well as popular science books (Why Zebras Don’t Get Ulcers, A Primate’s Memoir)—carries the reader effortlessly from one subject to the next. The work is a monumental contribution to the scientific understanding of human behavior that belongs on every bookshelf and many a course syllabus.

Sapolsky begins with a particular behavioral act, and then works backward to explain it chapter by chapter: one second before, seconds to minutes before, hours to days before, days to months before, and so on back through adolescence, the crib, the womb, and ultimately centuries and millennia in the past, all the way to our evolutionary ancestors and the origin of our moral emotions. He gets deep into the weeds of all the mitigating factors at work at every level of analysis, which is multilayered, not just chronologically but categorically. Or more to the point, uncategorically, for one of Sapolsky’s key insights to understanding human action is that the moment you proffer X as a cause—neurons, neurotransmitters, hormones, brain-specific transcription factors, epigenetic effects, gene transposition during neurogenesis, dopamine D4 receptor gene variants, the prenatal environment, the postnatal environment, teachers, mentors, peers, socioeconomic status, society, culture—it triggers a cascade of links to all such intervening variables. None acts in isolation. Nearly every trait or behavior he considers results in a definitive conclusion, “It’s complicated.”

To adapt sociologist Joel Best’s approach to statistics in Damned Lies and Statistics, I suggest there are three broad approaches to understanding human behavior:

1. The naive. This approach believes human behavior is simple and explainable. We just need the right key to unlock behavior (whether this is a religious text or a single scientific cause or a strongly held personal preferance).

2. The cynical. Human behavior is so complicated that we can never understand it. Why bother trying?

3. The critical. As Best suggests, this is an informed approach that knows how to ask the right questions. To the reductionist, it might ask whether there are other factors to consider. To the cynical, it might say that just because it is really complicated doesn’t mean that we can’t find patterns. Causation is often difficult to determine in the natural and social sciences but this does not mean that we cannot find bundles of factors or processes that occur. The key here is recognizing when people are making reasonable arguments about explaining human behavior: when do their claims go too far or when are they missing something?

Mutant stat: 4.2% of American kids witnessed a shooting last year

Here is how a mutant statistic about the exposure of children to shootings came to be:

It all started in 2015, when University of New Hampshire sociology professor David Finkelhor and two colleagues published a study called “Prevalence of Childhood Exposure to Violence, Crime, and Abuse.” They gathered data by conducting phone interviews with parents and kids around the country.

The Finkelhor study included a table showing the percentage of kids “witnessing or having indirect exposure” to different kinds of violence in the past year. The figure under “exposure to shooting” was 4 percent.

The findings were then reinterpreted:

Earlier this month, researchers from the CDC and the University of Texas published a nationwide study of gun violence in the journal Pediatrics. They reported that, on average, 7,100 children under 18 were shot each year from 2012 to 2014, and that about 1,300 a year died. No one has questioned those stats.

The CDC-UT researchers also quoted the “exposure to shooting” statistic from the Finkelhor study, changing the wording — and, for some reason, the stat — just slightly:

“Recent evidence from the National Survey of Children’s Exposure to Violence indicates that 4.2 percent of children aged 0 to 17 in the United States have witnessed a shooting in the past year.”

The reinterpreted findings were picked up by the media:

The Dallas Morning News picked up a version of the Washington Post story.

When the Dallas Morning News figured out something was up (due to a question raised by a reader) and asked about the origins of the statistic, they uncovered some confusion:

According to Finkelhor, the actual question the researchers asked was, “At any time in (your child’s/your) life, (was your child/were you) in any place in real life where (he/she/you) could see or hear people being shot, bombs going off, or street riots?”

So the question was about much more than just shootings. But you never would have known from looking at the table.

This appears to be a classic example of a mutant statistic as described by sociologist Joel Best in Damned Lies and Statistics. As Best explains, it doesn’t take much for a number to be unintentionally twisted such that it becomes nonsensical yet interesting to the public because it seems shocking. And while the Dallas Morning News might deserve some credit for catching the issue and trying to set the record straight, the incorrect statistic is now in the public and can easily be found.

The “value of estimating”

Here is another way to help students develop their mathematical skills: learn how to estimate.

Quick, take a guess: how tall is an eight-story building? How many people can be transported per hour on a set of train tracks in France? How many barrels of oil does the U.S. import each year?

Maybe you gave these questions your best shot – or maybe you skimmed right over them, certain that such back-of-the-napkin conjecture wasn’t worth your time. If you fall into the second, just-Google-it group, you may want to reconsider, especially if you’re a parent. According to researchers who study the science of learning, estimation is the essential foundation for more advanced math skills. It’s also crucial for the kind of abstract thinking that children need to do to get good grades in school and, when they’re older, jobs in a knowledge-based economy.

Parents can foster their kids’ guessing acumen by getting them to make everyday predictions, like how much all the items in the grocery cart will cost. Schools, too, should be giving more attention to the ability to estimate. Too many math textbooks “teach how to solve exactly stated problems exactly, whereas life often hands us partly defined problems needing only moderately accurate solutions,” says Sanjoy Mahajan, an associate professor of applied science and engineering at Olin College…

Sharpen kids’ logic enough and maybe some day they’ll dazzle people at cocktail parties (or TED talks) the way Mahajan does with his ballpark calculations. His answers to the questions at the top of this story: 80 ft., 30,000 passengers and 4 billion barrels. To come up with these, he guessed at a lot of things. For instance, for the number of barrels of oil the U.S. imports, he made assumptions about the number of cars in the U.S., the number of miles driven per car per year and average gas mileage to arrive at the number of gallons used per year. Then he estimated how many gallons are in a barrel. He also assumed that imported oil is used for transportation and domestic for everything else. The official tally for U.S. imports in 2010 was 4,304,533,000 barrels. Mahajan’s 4 billion isn’t perfect, but it’s close enough to be useful – and most of the time, that’s what counts.

It sounds like estimation helps with problem solving skills and taking known or guessed at quantities to develop reasonable answers. I tried this question about the barrels of oil with my statistics class today and we had one guess of 4 billion barrels (among a wide range of other answers). This also suggests that there is some room for creativity within math; it isn’t all about formulas but rather takes some thinking.

This reminds me that Joel Best says something similar in one of his books: being able to quickly estimate some big figures is a useful skill in a society where statistics carry a lot of weight. But to do some of this, do people have to have some basic figures in mind such as the total population of the United States (US Census population clock: over 312 million)? Is this a commonly known figure?

The article also suggests ways to take big numbers and break them down into manageable and understandable figures. Take, for example, the national debt of the United States is over 15 trillion dollars, a figure that is perhaps impossible to comprehend. But you could break it down in a couple of ways. The debt is slightly over $48k per citizen, roughly $192k per family of four. Or you could compare the debt to the yearly GDP.

Why cases of scientific fraud can affect everyone in sociology

The recent case of a Dutch social psychologist admitting to working with fraudulent data can lead some to paint social psychology or the broader discipline of sociology as problematic:

At the Weekly Standard, Andrew Ferguson looks at the “Chump Effect” that prompts reporters to write up dubious studies uncritically:

The silliness of social psychology doesn’t lie in its questionable research practices but in the research practices that no one thinks to question. The most common working premise of social-psychology research is far-fetched all by itself: The behavior of a statistically insignificant, self-selected number of college students or high schoolers filling out questionnaires and role-playing in a psych lab can reveal scientifically valid truths about human behavior.

And when the research reaches beyond the classroom, it becomes sillier still…

Described in this way, it does seem like there could be real journalistic interest in this study – as a human interest story like the three-legged rooster or the world’s largest rubber band collection. It just doesn’t have any value as a study of abstract truths about human behavior. The telling thing is that the dullest part of Stapel’s work – its ideologically motivated and false claims about sociology – got all the attention, while the spectacle of a lunatic digging up paving stones and giving apples to unlucky commuters at a trash-strewn train station was considered normal.

A good moment for reaction from a conservative perspective: two favorite whipping boys, liberal (and fraudulent!) social scientists plus journalists/the media (uncritical and biased!), can be tackled at once.

Seriously, though: the answer here is not to paint entire academic disciplines as problematic because of one case of fraud. Granted, some of the questions raised are good ones that social scientists themselves have raised recently: how much about human activity can you discover through relatively small sample tests of American undergraduates? But good science is not based on one study anyway. An interesting finding should be corroborated by similar studies done in different places at different times with different people. These multiple tests and observations help establish the reliability and validity of findings. This can be a slow process, another issue in a media landscape where new stories are needed all the time.

This reminds me of Joel Best’s recommendations regarding dealing with statistics. One common option is to simply trust all statistics. Numbers look authoritative, often come from experts, and they can be overwhelming. Just accepting them can be easy. At the other pole is the common option of saying that all statistics are simply interpretation and are manipulated so we can’t trust any of them. No numbers are trustworthy. Neither approaches are good options but they are relatively easy options. The better route to go when dealing with scientific studies is to have the basic skills necessary to understand whether they are good studies or not and how the process of science works. In this case, this would be a great time to call for better training among journalists about scientific studies so they can provide better interpretations for the public.

In the end, when one prominent social psychologist admits to massive fraud, the repercussions might be felt by others in the field for quite a while.

Quick Review: Stat-Spotting

Sociologist Joel Best has recently done well for himself by publishing several books about the misuse of statistics. This is an important topic: many people are not used to thinking statistically and have difficulty correctly interpreting statistics even though they are commonly used in media stories. Best’s most recent book on this subject, published in 2008, is Stat-Spotting: A Field Guide to Identifying Dubious Data. A few thoughts on this text:

1. One of Best’s strong points is that his recommendations are often based in common-sense. If a figure strikes you as strange, it probably is. He has tips about keeping common statistical figures in your mind to help keep sense of certain statistics. Overall, he suggests a healthy skepticism towards statistics: think about how the statistic was developed and who is saying it.

2. When the subtitle of the book says “field guide,” it means a shorter text that is to the point. Best quickly moves through different problems with statistical data. If you are looking for more thorough explanations, you should read Best’s 2001 book Damned Lies and Statistics. (A cynical reader might suggest this book was simply a way to make more money of topics Best has already explored elsewhere.)

3. I think this text is most useful for finding brief examples of how to analyze and interpret data. There are numerous examples in here that could start off a statistics lesson or could further illustrate a point. The examples cover a variety of topics and sources.

This is a quick read that could be very useful as a simple guide to combating innumeracy.