Maps, distortions, and realities

Maps do not just reflect reality; a new online exhibit at the Boston Public Library looks at how they help shape reality:

desk globe on shallow focus lens

Photo by NastyaSensei on Pexels.com

The original topic was to do an exhibition of a classic category of maps called persuasive cartography, which tends to refer to propaganda maps, ads, political campaign maps, maps that obviously you can tell have an agenda. We have those materials in our collections of about a quarter million flat maps, atlases, globes and other cartographic materials. But we decided in recognition of what’s going on now to expand into a bigger theme about how maps produce truth, and how trust in maps and other visual data is produced in media and civil society. So rather than thinking about just about maps which are obviously treacherous, distorting, and deceptive, we wanted to think about how every map goes about presenting the world and how they can all reflect biases and absences or incorrect classifications of data. We also wanted to think about this as a way to promote data literacy, which is a critical attitude towards media and data visualizations, to bring together this long history of how maps produce our sense of reality…

We commissioned a special set of maps where we compiled geographic data about the state of Massachusetts across a few different categories, like demographics, infrastructure, and the environment. We gave the data to a handful of cartographers and asked them to make a pair of maps that show different conclusions that disagree with each other. One person made two maps from environmental data from toxic waste sites: One map argues that cities are most impacted by pollution, and the other says it’s more rural towns that have a bigger impact. So this project was really meant to say, we’d like to think that numbers speak for themselves, but whenever we’re using data there’s a crucial role for the interpreter, and the way people make those maps can really reflect the assumptions they’ve brought into the assignment…

In one section of the show called “How the Lines Get Bent,” we talk about some of the most common cartographic techniques that deserve our scrutiny: whether the data is or isn’t normalized to population size, for example, will produce really different outcomes. We also look at how data is produced by people in the world by looking at how census classifications change over time, not because people themselves change but because of racist attitudes about demographic categorizations that were encoded into census data tables. So you have to ask: What assumptions can data itself hold on to? Throughout the show we look at historic examples as well as more modern pieces to give people questions about how to look at a map, whether it’s simple media criticism, like: Who made this and when? Do they show sources? What are their methods, and what kinds of rhetorical framing like titles and captions do they use? We also hit on geographic analysis, like data normalization and the modifiable area unit problem…

So rather than think about maps as simply being true or false, we want to think about them as trustworthy or untrustworthy and to think about social and political context in which they circulate. A lot of our evidence of parts of the world we’ve never seen is based on maps: For example, most of us accept that New Zealand is off the Australian coast because we see maps and assume they’re trustworthy. So how do societies and institutions produce that trust, what can be trusted and what happens when that trust frays? The conclusion shouldn’t be that we can’t trust anything but that we have to read things in an informed skeptical manner and decide where to place our trust.

Another reminder that data does not interpret itself. Ordering reality – which we could argue that maps do regarding spatial information – is not a neutral process. People look at the evidence, draw conclusions, and then make arguments with the data. This extends across all kinds of evidence or data, ranging from statistical evidence to personal experiences to qualitative data to maps.

Educating the readers of maps (and other evidence) is important: as sociologist Joel Best argues regarding statistics, people should not be naive (completely trusting) or cynical (completely rejecting) but rather should be critical (questioning, skeptical). But, there is another side to this: how many cartographers and others that produce maps are aware of the possibilities of biased or skewed representations? If they know this, how do they then combat it? There would be a range of cartographers to consider, from people who make road atlases to world maps to those working in media who make maps for the public regarding current events. What guides their processes and how often do they interrogate their own presentation? Similarly, are people more trusting of maps than they might be of statistics or qualitative data or people’s stories (or personal maps)?

Finally, the interview hints at the growing use of maps with additional data. I feel like I read about John Snow’s famous 1854 map of cholera cases in London everywhere but this has really picked up in recent decades. As we know more about spatial patterns as well as have the tools (like GIS) to overlay data, maps with data are everywhere. But, finding and communicating the patterns is not necessarily easy nor is the full story of the analysis and presentation given. Instead, we might just see a map. As someone who has published an article using maps as key evidence, I know that collecting the data, putting it into a map, and presenting the data required multiple decisions.

Looking for data reporting and presentation standards for COVID-19

As the world responds to COVID-19, having standardized data could go a long ways:

All in all, information made available by state health departments has been more timely and complete than information coming from the CDC, especially from a testing perspective, for which the CDC only offers a national aggregate not counting private labs. However, there is no overall standard when it comes to the information that has to be made public at the state level, which has led to a large variation in data quality across the country…

The COVID Tracking Project has assembled what the “ideal” Covid-19 dataset should look like. It includes the number of total tests conducted (including commercial tests), the number of people hospitalized (in cumulative and daily increments), the number of people in the ICU, and the race and ethnicity information of every case and death. Few states check all the boxes, but the situation is improving…

Some kind of standard as how to present the data to the public would be helpful. Health departments do not all have the resources to put together custom elaborate data visualizations of the Covid-19 pandemic. Most health departments have adopted geographic information system mapping programs from companies like Tableau and Esri — similar to the John Hopkins University dashboard — but there is no standard and no guidance explaining what should be put in place.

Organizing actions in a variety of sectors – from healthcare to the economy to social interaction to political interventions – relies heavily on statistics about the problem at hand. Without good data, actors are reacting to anecdotal evidence or acting without any basis at all; this is not what you want when time is of the essence. Of course, you can also have good data and then actors can choose to ignore it or draw the wrong conclusions. At the same time, we tend to argue “knowledge is power” and having good information could lead to better decisions.

Hopefully this means that all of the various actors will be better prepared next time with a process in place that will help everyone be on the same page and have the same capabilities sooner.

 

Reminder: do not get carried away making fancy charts and graphs

The Brewster Rockit: Space Guy! comic strip from last Sunday makes an important point about designing charts and graphs: don’t get carried away.

https://www.gocomics.com/brewsterrockit/2020/05/03

Brewster Rockit May 3, 2020

The goal of using a chart or graph is to distill the information behind it into an easy-to-read format for making a quick point. A reader’s eye is drawn to a chart or graph and it should be easy to figure out the point the graphic is making.

If the graph or chart is too complicated, it loses its potency. If it looks great or clever but cannot help the reader interpret the data correctly, it is not very useful. If the researcher spends a lot of time tweaking the graphic to really make it eye-popping, it may not be worth it compared to simply getting the point across.

In sum: graphs and charts can be fun. They can break up long text and data tables. They can focus attention on an important data point or relationship. At the same time, they can get too complicated and become a time suck both for the producer of the graphic and those trying to figure them out.

From outlier to outlier in unemployment data

With the responses to COVID-19, unemployment is expected to approach or hit a record high among recorded data:

April’s employment report, to be released Friday, will almost certainly show that the coronavirus pandemic inflicted the largest one-month blow to the U.S. labor market on record.

Economists surveyed by The Wall Street Journal forecast the new report will show that unemployment rose to 16.1% in April and that employers shed 22 million nonfarm payroll jobs—the equivalent of eliminating every job created in the past decade.

The losses in jobs would produce the highest unemployment rate since records began in 1948, eclipsing the 10.8% rate touched in late 1982 at the end of the double-dip recession early in President Reagan’s first term. The monthly number of jobs lost would be the biggest in records going back to 1939—far steeper than the 1.96 million jobs eliminated in September 1945, at the end of World War II.

But, also noteworthy is what these rapid changes follow:

Combined with the rise in unemployment and the loss of jobs in March, the new figures will underscore the labor market’s sharp reversal since February, when joblessness was at a half-century low of 3.5% and the country notched a record 113 straight months of job creation.

In other words, the United States has experienced both a record low in unemployment and a record high within three months. A few thoughts connected to this:

1. Either outlier is noteworthy; having them occur so close to each other is more unusual.

2. Their close occurrence makes it more difficult to ascertain what is “normal” unemployment for this period of history. The fallout of COVID-19 is unusual. But the 3.5% unemployment can also be considered unusual compared to historical data.

3. Given these two outliers, it might be relatively easy to dismiss either as aberrations. Yet, while people are living through the situations and the fallout, they cannot simply be dismissed. If unemployment now is around 16%, this requires attention even if historically this is a very unusual period.

4. With these two outliers, predicting the future regarding unemployment (and other social measures) is very difficult. Will the economy quickly restart in the United States and around the world? Will COVID-19 be largely under control within a few months or will there be new outbreaks for a longer period of time (and will governments and people react in the same ways)?

In sum, dealing with extreme data – outliers – is a difficult task for everyone.

Interpreting data: the COVID-19 deaths in the United States roughly match the population of my mid-sized suburb

Understanding big numbers can be difficult. This is particularly true in a large country like the United States – over 330,000,000 residents – with a variety of contexts. Debates over COVID-19 numbers have been sharp as different approaches appeal to different numbers. To some degree, many potential social problems or public issues face this issue: how to use numbers (and other evidence) to convince people that action needs to be taken.

This week, the number of deaths in the United States due to COVID-19 approached the population of my suburban community of just over 53,000 residents. We are a mid-sized suburb; this is the second largest community in our county, the most populous suburban county in the Chicago region outside of Cook County. The community covers just over 11 square miles. Imagining an entire mid-sized suburb of COVID-19 deaths gives one pause. I had heard the comparison a week or two ago to the deaths matching the size of a good-sized indoor arena; thinking of an entire sizable community helps make sense of the number of deaths across the country.

Of course, there are other numbers to cite. Our community has relatively few cases – less than hundred as of a few days ago. Considering the Chicago suburbs: “If the Chicago suburbs were a state, it would have the 11th-highest COVID-19 death toll in the nation.” The COVID-19 cases and deaths are scattered throughout the United States, with clear hotspots in some places like New York City and fewer cases in other places. And so on.

Perhaps all of this means that we need medical experts alongside data experts in times like these. We need people well-versed in statistics and their implications to help inform the public and policymakers. Numbers are interpreted and used as part of arguments. Having a handle on the broad range of data, the different ways it can be interpreted (including what comparisons are useful to make), connecting the numbers to particular actions and policies, and communicating all of this clearly is a valuable skill set that can serve communities well.

 

 

More on modeling uncertainty and approaching model results

People around the world want answers about the spread of COVID-19. Models offer data-driven certainties, right?

The only problem with this bit of relatively good news? It’s almost certainly wrong. All models are wrong. Some are just less wrong than others — and those are the ones that public health officials rely on…

The latest calculations are based on better data on how the virus acts, more information on how people act and more cities as examples. For example, new data from Italy and Spain suggest social distancing is working even better than expected to stop the spread of the virus…

Squeeze all those thousands of data points into incredibly complex mathematical equations and voila, here’s what’s going to happen next with the pandemic. Except, remember, there’s a huge margin of error: For the prediction of U.S. deaths, the range is larger than the population of Wilmington, Delaware.

“No model is perfect, but most models are somewhat useful,” said John Allen Paulos, a professor of math at Temple University and author of several books about math and everyday life. “But we can’t confuse the model with reality.”…

Because of the large fudge factor, it’s smart not to look at one single number — the minimum number of deaths, or the maximum for that matter — but instead at the range of confidence, where there’s a 95% chance reality will fall, mathematician Paulos said. For the University of Washington model, that’s from 50,000 to 136,000 deaths.

Models depend on the data available, the assumptions made by researchers, the equations utilized, and then there is a social component where people (ranging from academics to residents to leaders to the media) interact with the results of the model.

This reminds me of sociologist Joel Best’s argument regarding how people should view statistics and data. One option is to be cynical about all data. The models are rarely right on so why trust any numbers? Better to go with other kinds of evidence. Another option is to naively accept models and numbers. They have the weight of math, science, and research. They are complicated and should simply be trusted. Best proposes a third option between these two extremes: a critical approach. Armed with some good questions (what data are the researchers working with? what assumptions did they make? what do the statistics/model actually say?), a reader of models and data analysis can start to evaluate the results. Models cannot do everything – but they can do something.

(Also see a post last week about models and what they can offer during a pandemic.)

Models are models, not perfect predictions

One academic summarizes how we should read and interpret COVID-19 models:

Every time the White House releases a COVID-19 model, we will be tempted to drown ourselves in endless discussions about the error bars, the clarity around the parameters, the wide range of outcomes, and the applicability of the underlying data. And the media might be tempted to cover those discussions, as this fits their horse-race, he-said-she-said scripts. Let’s not. We should instead look at the calamitous branches of our decision tree and chop them all off, and then chop them off again.

Sometimes, when we succeed in chopping off the end of the pessimistic tail, it looks like we overreacted. A near miss can make a model look false. But that’s not always what happened. It just means we won. And that’s why we model.

Five quick thoughts in response:

  1. I would be tempted to say that the perilous times of COVID-19 lead more people to see models as certainty but I have seen this issue plenty of times in more “normal” periods.
  2. It would help if the media had less innumeracy and more knowledge of how science, natural and social, works. I know the media leans towards answers and sure headlines but science is often messier and takes time to reach consensus.
  3. Making models that include social behavior is difficult. This particular phenomena has both a physical and social component. Viruses act in certain ways. Humans act in somewhat predictable ways. Both can change.
  4. Models involve data and assumptions. Sometimes, the model might fit reality. At other times, models do not fit. Either way, researchers are looking to refine their models so that we better understand how the world works. In this case, perhaps models can become better on the fly as more data comes in and/or certain patterns are established.
  5. Predictions or proof can be difficult to come by with models. The language of “proof” is one we often use in regular conversation but is unrealistic in numerous academic settings. Instead, we might talk about higher or lower likelihoods or provide the best possible estimate and the margins of error.