Summarizing data visualization errors

Check out this good quick overview of visualization errors – here are a few good moments:

Everything is relative. You can’t say a town is more dangerous than another because the first one had two robberies and the other only had one. What if the first town has 1,000 times the population that of the first? It is often more useful to think in terms of percentages and rates rather than absolutes and totals…

It’s easy to cherrypick dates and timeframes to fit a specific narrative. So consider history, what usually happens, and proper baselines to compare against…

When you see a three-dimensional chart that is three dimensions for no good reason, question the data, the chart, the maker, and everything based on the chart.

In summary: data visualizations can be very useful for highlighting a particular pattern but they can also be altered to advance an incorrect point. I always wonder with these examples of misleading visualizations whether the maker intentionally made the change to advance their point or whether there was a lack of knowledge about how to do good data analysis. Of course, this issue could arise with any data analysis as there are right and wrong ways to interpret and present data.

Visualizing immigration to the United States

Here are three interesting visualization options – an animated map and two infographics – to see immigration to the United States. Three quick thoughts:

  1. The map really does help illustrate the various stages of immigration. It starts from Western Europe, moves significantly to Eastern Europe in the late 1800s, and then opens to Mexico, east Asia, and other parts of the globe in the 1960s.
  2. It is unfortunate that the arrivals from Asia have to go over the “break” in the map since it has the Atlantic in the center. At first, I couldn’t figure out where the dots coming into the United States from the left were coming from.
  3. The second infographic provides some proportional context: even with the jump in migrants from Mexico, they represent a smaller proportion of the total U.S. population than the immigration spikes in the 1800s.

Presenting big data about Chicago

The Chicago Architecture Foundation has a new exhibit highlighting the use of big data in Chicago:

Architects, planners, engineers and citizens, it contends, are increasingly using massive amounts of data to analyze urban issues and shape innovative designs…

But data, the show argues, is useful as well as ubiquitous. We see some classically gritty Chicago stuff to back this up, though it’s not quite powerful or precise enough to be fully persuasive…

More convincing are the show’s examples of “digital visualization,” which is geekspeak for using digital technology to present and analyze urban planning data.

Take a monumental, crowd-pleasing map of Chicago, 15 feet high and 30 feet wide, which presents the footprints of thousands of buildings, even individual houses, and color-codes them by the era in which they were built. We see the impact of the city’s three great building booms, from Chicago’s earliest days to 1899, from 1900 to 1945, and from 1946 to 1979. The recent surges that filled downtown with new skyscrapers look puny by comparison.

Also worth seeing: Video monitors which display data for Divvy, the city’s bike-sharing program. They offer neat tidbits: Divvy’s most popular station, for example, is at Millennium Park.

Sounds interesting. Big cities are complex social entities who could benefit from large-scale and real-time data collection and analysis. Of course, as Kamin notes at the end, there still is a human side to cities that cannot be ignored but getting a handle through data on what is happening could go a long way.

Another dimension to this is how to best present big data. While the online presentation of maps has grown popular, how can this be done best in person? I look forward to seeing this exhibit in person as I already like what the Chicago Architecture Foundation has done with this space. Here is part of the gallery a few years ago:

CAFChicagoAug11This is a great free place to learn more about Chicago and then choose among the cool offerings in the gift shop or sign up for one of the architecture tours that cover all different aspects of Chicago.

The factors behind the rise of viral maps

Here is a short look at how viral maps (“graphic, easy to read, and they make a quick popular point”) are put together by one creator:

When I need to find a particular data set, it’s often as straightforward as a search for the topic with the word “shapefile” or “gis” attached. There’s so much data just sitting on servers that if you can imagine it, it’s probably out there somewhere (often for free). Sometimes though, finding data requires a deeper search. A lot of government-provided data sits inside un-indexed data portals or clearinghouses. Depending on the quality of the portal, these can be tedious to sort through…

Simplicity and ease-of-use: Interactive maps are great, but I want the maps I make to be straightforward to read and understand. I don’t want viewers to have to figure out how to use the map; they should just be able to look at it and figure out what’s going on.

Projections: Typical web maps are limited to the Web Mercator projection. I don’t have any objection to Mercator in principle (in fact it’s brilliant for what it does), but I can’t in good conscience use it for maps at a continental or global scale. Sticking to static maps allows me to choose more appropriate projections for the data and region I’m depicting.

Uniformity: I want everyone who visits my maps to be presented with the same information. I don’t want some algorithm deciding that one visitor is shown a particular view while another visitor gets a different one.

These principles sound similar to what one would expect for any sort of online chart or infographic. There is plenty of data available online but it takes some skill in order to present the data clearly and then market the map to the appropriate audience.

Now that I think about it, it is a little surprising that it took this long for viral maps to catch on. First, the Internet makes a lot of geographic data easily accessible. Two, it is a visual medium and maps are essentially graphics (audio is another story). Third, geographic data seems to feed into a lot of hot-button topics of conversation these days as people of different races (residential segregation), cultural viewpoints (think the American South or the Bible Belt), education (think the Creative Callas looking for exciting urban neighborhoods), and other groupings tend to live in different places.

I wonder if the real story here isn’t the technology that makes mapping on a large-scale relatively easy today. GIS software has been around for a while but it generally pretty expensive and has a learning curve. Now, there are numerous websites that offer access to data and mapping capability (think the Census or Social Explorer). Shapefiles are used by a variety of local governments and researchers and can be downloaded. There are good freeware GIS programs like GeoDa. You need some bandwidth and computing power to get the data and crunch the numbers. All together, the pieces have now come together for more people to access, manipulate, and publish maps in a way that wasn’t possible even just 5 years ago.


Visualizing the migration flows in and out of DuPage County

The US Census recently released data on county-by-county migration flows. The tables that can be downloaded are huge but here is a look at the flows in and out of DuPage County:


Looks like a lot of movement to (and some from) warmer locales – southern California, Arizona, Florida – and lots of movement in the Midwest in an area roughly bounded by St. Louis, Detroit, and Minneapolis. You can also look at the migrations by education or income level.

Very cool all around. There is a lot of data to crunch here and these visualizations help make sense of a lot of data. At the same time, these aren’t necessarily huge movements of people. Take Harris County, home to Houston (4th largest city in the United States): over this five year span, there was a +88 flow from Harris to DuPage County.

Looking at inequality in NYC by translating wealth differences into building heights

It can be difficult to visualize inequality but here is an innovative way of doing so: imagining wealth as buildings in New York City.

In his most recent visualization project, the Pittsburgh-based artist and researcher re-imagines what the city’s skyline would look like if building height were a direct reflection of a neighborhood’s net household wealth. “I was inspired to create this project after standing atop Mt. Washington in my hometown of Pittsburgh and looking at the Pittsburgh skyline,” he explains. “I thought to myself, ‘What if you could actually see inequality?’ This relatively even landscape would look much different.”

Lamm, who is responsible for other viral visualizations like Normal Barbie, translated Esri’s map of median household net worth in New York City (based on 2010 Census data) into the bright green 3-D bars you’re looking at. Every $100,000 of net worth in a section on Esri’s map equals one centimeter in height on Lamm’s visualization. So if one section (which appears to consist of multiple blocks) had a net worth of $500,000, Lamm’s rendering would measure 5 cm high. Similarly, if another section had a net worth of $80,000, the green would appear at a much flatter 0.8 cm.

Of the maps/visualizations available here, the best one is probably the first one that shows much of Manhattan from the northwest looking southeast.

Choosing to visualize wealth rather than income is a strategic choice. Much talk about inequality involves income but this may be the wrong metric. Income is more about short-term access to money but wealth may be more important for longer-term outcomes (purchasing a house, etc.) and the wealth differences between groups are quite a big larger. For example, the differences in wealth between the top 5% and the rest of America are astounding as are the differences between whites and blacks as well as Latinos.

Additionally, singling out New York, particularly Manhattan, is an interesting choice. The differences here are indeed stark. Manhattan is the seat of the financial sector. But, few places in the United States would have this much wealth inequality.

The value of using maps to see the rise and fall of Detroit

Here is a series of maps that show both the growth and decline of Detroit over its history. When looking at these maps, I’m reminded that it is quite difficult to talk about either the rise or decline of a major city just by discussing raw numbers, such as population increases or losses or economic figures, or photographs. For example, we could talk about the rise of Houston in recent decades and contrast this to the sharp population decrease in Detroit. Moving past statistics, we could include photographs of a city. Detroit has been photographed many times in recent years with often bleak scenes illustrating economic and social decline.

In some middle ground between numbers and photos and in-depth analysis (of which there does not seem to be much about Detroit recently – the mainstream media has primarily focused on short snippets of information) are maps. A good map has sufficient information to provide a top-down approach to the city and give some indication of the city’s infrastructure. Additionally, it is much easier today to provide multiple layers of mapped information based on Census data and other sources. Growth is relatively easy to see as new streets and points of interest starting showing up. On the other hand, decline might be harder to show as the streets may be empty and the points of interest might be decaying. Still, a current map shows the scope of the problem facing Detroit: it is population and economic decline plus a large chunk of land and structures that is difficult to maintain.

All together, I’m advocating for more widespread use of maps in reporting on and discussions about cities, whether they are struggling or thriving. Maps can help us move beyond seeing vacant houses or economic developments and take in the big picture all at once.