Graphic options for illustrating where Americans moved during COVID-19

I appreciate the effort at CityLab to take all of the data regarding where Americans moved during the COVID-19 pandemic and put it into graphs and charts. Good graphs and charts should help illustrate relationships between variables and help readers see patterns. Here are several choices that I thought succeeded.

First, start with patterns in metro areas across the United States.

The two colors plus the size of the circle show the percentage change in population. The percentage is a nice touch yet the comparison to the previous year might slip past some viewers.

Second, another way to look at metro areas on the whole regarding population changes.

The side-by-side of central cities and suburbs quickly shows several differences: lower ratios for cities, more variability among suburban counties, more losses for cities during COVID. The patterns among suburban counties are a little hard to pick up; there are a number of counties that lost people even as the general trend might have been up.

Third, where did all those people moving from New York City, specifically Manhattan go?

In absolute numbers, there are patterns this map displays nicely: a lot of moves in New York City and in the region plus moves to other metro areas (including Miami, Los Angeles, Chicago, and more). The inset of the Southwest at the bottom left is a nice touch…presumably New Yorkers did not move in large numbers to anywhere roughly between Nashville and Seattle.

Fourth, which New Yorkers moved?

Looking at zip codes, neighborhoods with higher incomes had more people moving while the numerous neighborhoods with lower incomes had smaller changes in inflow.

All together, this is more than just a series of pretty graphics. These choices – first about what data to use and second about how to present one variable in light of another – help clarify what happened in the last year. Each choice could have been a little different; emphasize a different part of the data or another variable, choose another graphic option. Yet, while there is certainly more to untangle about mobility, cities and suburbs, and COVID-19, these images help us start making sense of complex phenomena.

Still looking for helpful numerical comparisons to make sense of COVID-19 deaths

A list of the most deaths on a single day has been making the social media rounds. Titled “The Deadliest Days in American History,” spots #4-7 are recent days with COVID-19 deaths following the Galveston Hurricane, the battle of Antietam, and September 11, 2001. But, the numbers on the list are not what they seem:

An infographic listing the "Deadliest Days in American History."
I first saw the image on Facebook.

For one thing, a list of the “deadliest days” in American history would include days with the most deaths, not the most deaths from one discrete event. On all of the days included, more people in the United States died than the numbers listed. According to Reuters, 2,861 COVID-19 deaths were indeed reported last Thursday. But that doesn’t account for the number of people who died from heart disease (last week’s daily average was 1,532 deaths), lung and tracheal cancer (last week’s daily average was about 560 deaths), or chronic kidney disease (last week’s daily average was about 290 deaths). Deaths from drug overdoses have also been reaching record highs this year, a trend that may have been worsened by the pandemic. (Obviously, more people died on the days of the Galveston hurricane, the Battle of Antietam, 9/11, and Pearl Harbor, too.)

By its own rules, the list is also incomplete. More than 3,000 people died in the 1906 San Francisco earthquake, which isn’t mentioned, nor is the 1899 San Ciriaco hurricane, which killed more than 3,300 people in Puerto Rico over the course of six to nine hours. While we’re at it, the population of the United States is much larger now. The U.S. was home to about one-tenth of the current population during the Battle of Antietam. Losing 3,600 people back then would be like losing 36,000 people now.

But yes, the general idea behind this list—and other attempts to communicate the horrors of the pandemic as a set of digestible facts—is worthwhile. It can be helpful to compare the number of deaths specifically from the coronavirus to other historical events in which there were huge losses of American life. More than 286,000 people in the U.S. have died from COVID-19 thus far. Compare that to the 116,000 Americans who died in World War I; 405,000 Americans who died in World War II; 37,000 Americans who died in the Korean War; and 58,000 Americans who died in the Vietnam War. The 1918 flu pandemic killed 675,000 Americans, the 1968 influenza A pandemic killed 100,000 Americans, and the 2009 H1N1 pandemic killed 12,469 Americans.

The general idea may be a good one: similar numbers reported day after day lose their power. It can be hard for the general public to interpret large numbers in the abstract, as this earlier post about comparing an earlier death figure from COVID-19 to my community’s population. The list tries to place the daily death totals in historical context by noting that these are not just normal numbers; they are high numbers for any day in American history.

Yet, as noted above, the numbers do not quite work out. Perhaps the list should have a new title like “Days with the most deaths directly attributable to unusual causes” since it ignores all causes of death on particular days. And even then, other natural disasters are ignored and putting the numbers in a different context – as a percent of the population as a whole – also changes the list.

The list might still spur people to action, even if the list has flaws. And this was probably the goal of the list in the first place: it is not meant to be an academic study on the topic but a call to action. Like many statistics, these numbers are used in a way intended to nudge people toward different behavior.

Making a big deal out of a round number, Dow at 30,000 edition

The Dow recently topped 30,000. What is notable about this number?

Photo by energepic.com on Pexels.com

How big a deal is Dow 30,000?

It’s just an arbitrary number, and it doesn’t mean things are much better than when the Dow was at 29,999. What’s more impactful is that the Dow has finally clawed back all its losses from the pandemic and is once again reaching new heights. It is up 61.5% since dropping below 18,600 on March 23.

It took just over nine months for the Dow to surpass the record it had set in February, before panic about the coronavirus triggered the market’s breathtaking sell-off.

I like this explanation for two reasons. First, it downplays hitting 30,000 just because it is a large round number. Why should 30,000 be more important than 29,000 or 29,852? Because round numbers seem more meaningful to us, especially one that is a change from the 20,000s to the 30,000s. Second, it provides a longer context for the rise to 30,000. That particular number is meaningful in part because of the record in February, the drop in March, and then the steady rise. Arguably, this rise since March is much more important than getting past a particular number.

In addition to these steps, a few more could help people interpret the 30,000 figure. Instead of focusing on a particular number, how about discussing the percentage change? This is done regularly with other financial figures. Such a story is ripe for a visual showing the change over time. And a final option would be to downplay such milestone numbers in this story and in future reporting and instead focus on other markers of financial patterns rather than emphasizing outliers (peaks and valleys).

A Patrick Mahomes word cloud, strengths and weaknesses

The season-opening NFL broadcast included a word cloud of descriptions of Chief’s quarterback Patrick Mahomes from his teammates:

On the broadcast, they noted that “leader” was mentioned the most times and several people mentioned “smart” and “competitive.” And, since this came right after a conversation of Mahomes’ record contract, it was noted that no teammate said “rich.”

A few thoughts on this graphic:

  1. It highlights the popularity and/or spread of word clouds. If it makes it to a football broadcast, it is all throughout the United States.
  2. It remains a way to highlight words or themes across a series of interviews or texts. It can take time to relay thoughts from multiple interactions; the word cloud tries to summarize the concepts. But…
  3. The size of the words do not easily convey their frequency in this particular graphic. Leader is clearly the biggest, competitive and smart are somewhere in the middle, and then there are a lot of other words. Yet, the length of certain words – “courageous” or “extraordinary” – take up a lot of space even if they were just mentioned once.
  4. The colors of the word cloud are tied to the Chiefs’ colors. But with the background changing a bit behind the words (“add a dynamic background to that boring word cloud!”), it can be hard to read some of the words in red (see “smart” above).
  5. Without knowing the number of interviews or how many total descriptors were given, it is hard to know how many words stand out.

An interesting choice of graphic and still some work to do to make this even a better presentation of data.

A health example of choosing between a dichotomous outcome or a continuum

When I teach Statistics and Research Methods, we talk a little about how researchers make decisions about creating and using categories for data they have. As this example of recommendations about fertility notes, creating categories can be a tricky process:

Photo by Burak K on Pexels.com

Being 35 or older is labeled by the medical community as “advanced maternal age.” In diagnosis code speak, these patients are “elderly,” or in some parts of the world, “geriatric.” In addition to being offensive to most, these terms—so jarringly at odds with what is otherwise considered a young age—instill a sense that one’s reproductive identity is predominantly negative as soon as one reaches age 35. But the number 35 itself, not to mention the conclusions we draw from it, has spun out of our collective control…

The 35-year-old threshold is not only known by patients, it is embraced by doctors as a tool that guides the care of their patients. It’s used bimodally: If you’re under 35, you’re fine; if you’re 35 or older, you have a new host of problems. This interpretation treats the issue at hand as what is known as a “threshold effect.” Cross the threshold of age 35, it implies, and the intrinsic nature of a woman’s body has changed; she falls off a cliff from one category into another. (Indeed, many of my patients speak of crossing age 35 as exactly this kind of fall, with their fertility “plummeting” suddenly.) As I’ve already stated, though, the age-related concerns are gradual and exist along a continuum. Even if the rate of those risks accelerates at a certain point, it’s still not a quantum leap from one risk category to another.

This issue comes up frequently in science and medicine. In order to categorize things that fall along a continuum, things that nature itself doesn’t necessarily distinguish as being separable into discrete groups, we have to create cutoffs. Those work very well when comparing large groups of patients, because that’s what the studies were designed to do, but to apply those to individual patients is more difficult. To a degree, they can be useful. For example, when we are operating far from those cutoffs—counseling a 25-year-old versus a 45-year-old—the conclusions to draw from that cutoff are more applicable. But operate close to it—counseling a 34-year-old trying to imagine her future 36-year-old self—and the distinction is so subtle as to be almost superfluous.

The trade-offs seem clear. A single point where the data turns from one category to another, an age of 35, simplifies the research findings (though the article suggests they may not actually point to 35) and allows doctors and others to offer clear guidance. The number is easy to remember.

A continuum, on the other hand, might better fit the data where there is not a clear drop-off at an age near 35. The range offers more flexibility for doctors and patients to develop an individualized approach.

Deciding which is better requires thinking about the advantages of each, the purpose of the categories, and who wants what information. The “easy” answer is that both sets of categories can exist; people could keep in mind a rough estimate of 35 while doctors and researchers could have conversations where they discuss why that particular age may or may not matter for a person.

More broadly, learning more about continuums and considering when they are worth deploying could benefit our society. I realize I am comfortable with them; sociologists suggest many social phenomena fall along a continuum with many cases falling in between. But, this tendency toward continuums or spectrums or more nuanced or complex results may not always be helpful. We can decry black and white thinking and yet we all need to regularly make quick decisions based on a limited number of categories (I am thinking of System 1 thinking described by behavioral economists and others). Even as we strive to collect good data, we also need to pay attention to how we organize and communicate that data.

Mode, plurality, and “the most popular way”

I recently stumbled across this headline from Stanford News: “Meeting online has become the most popular way U.S. couples connect, Stanford sociologist finds.” Would the average reader assume this means that more than 50% of couples meet online?

This is not what the headline or the story says. More details from the story:

Rosenfeld, a lead author on the research and a professor of sociology in the School of Humanities and Sciences, drew on a nationally representative 2017 survey of American adults and found that about 39 percent of heterosexual couples reported meeting their partner online, compared to 22 percent in 2009.

It appears 39% of couples meet online. According to the summary of the paper, the others ways couples meet are:

Traditional ways of meeting partners (through family, in church, in the neighborhood) have all been declining since World War II.

The 39% figure meets the definition of both the mode and a plurality, respectively (both definitions from Google):

the value that occurs most frequently in a given set of data.

the number of votes cast for a candidate who receives more than any other but does not receive an absolute majority.

Still, I suspect there might be some confusion. Online dating brings more Americans together than any other method but it is only responsible for a little less than forty percent of couples.

Graphing changing household arrangements from 1960 to 2017

An article discussing changes in American household arrangements includes this graph:

HouseholdArrangements1960to2017

A summary of the data:

It all represents an increasing distance from the nuclear-family structure considered traditional for decades. The changes solidify shifts that have been mounting since then, erasing the notion of one dominant family type. In the early 1960s, two-thirds of children were raised in male-breadwinner, married-couple families. By contrast, today there is no one family-and-work arrangement that encompasses the majority of children, demographers say.

“That dominant model declined, but it’s not like it was replaced by one thing,” says Philip Cohen, professor of sociology at the University of Maryland. “It was replaced by a peacock’s tail, a plethora of different arrangements.”

The graph is most effective at showing the biggest change: the decline of the “mother-father married, father only earner” group over nearly six decades. Two other categories have significant increases – married and dual earners, single mother – while the five categories at the bottom involve relatively fewer households.

The graph is unusually skinny from left to right and this helps emphasize the straight lines up or down over time. Would a wider x-axis show some more variation over time or are the trends always pretty consistent?

The colors are a little hard to distinguish. I am not usually in favor of dotted lines and so on but this might be an opportunity to differentiate between trend lines.

Just thinking about other graph options, a pie chart for each time period might also communicate the big change well (though the smaller categories might not show up as well) or a clustered bar graph with the two years side to side could show the relative changes for each group.

In sum, graphing significant social change is not necessarily easy and this format clearly communicates a big change.