Summarizing data visualization errors

Check out this good quick overview of visualization errors – here are a few good moments:

Everything is relative. You can’t say a town is more dangerous than another because the first one had two robberies and the other only had one. What if the first town has 1,000 times the population that of the first? It is often more useful to think in terms of percentages and rates rather than absolutes and totals…

It’s easy to cherrypick dates and timeframes to fit a specific narrative. So consider history, what usually happens, and proper baselines to compare against…

When you see a three-dimensional chart that is three dimensions for no good reason, question the data, the chart, the maker, and everything based on the chart.

In summary: data visualizations can be very useful for highlighting a particular pattern but they can also be altered to advance an incorrect point. I always wonder with these examples of misleading visualizations whether the maker intentionally made the change to advance their point or whether there was a lack of knowledge about how to do good data analysis. Of course, this issue could arise with any data analysis as there are right and wrong ways to interpret and present data.

“The most misleading charts of 2015, fixed”

Here are some improved charts first put forward by politicians, advocacy groups, and the media in 2015.

I’m not sure exactly how they picked “the most misleading charts” (is there bias in this selection?) but it is interesting that several involve a misleading y-axis. I’m not sure that I would count the last example as a misleading chart since it involves a definition issue before getting to the chart.

And what is the purpose of the original, poorly done graphics? Changing the presentation of the data provides evidence for a particular viewpoint. Change the graphic depiction of the data and another story could be told. Unfortunately, it is actions like these that tend to cast doubt on the use of data for making public arguments – the data is simply too easy to manipulate so why rely on data at all? Of course, that assumes people look closely at the chart and the data source and know what questions to ask…

“New Apps Instantly Convert Spreadsheets Into Something Actually Readable”

Several new apps transform spreadsheet data into a chart or graph without having to spend much or any time with the raw data:

It’s called Project Elastic, and he unveiled the thing this fall at a conference run by his company, Tableau. The Seattle-based company has been massively successful selling software that helps big businesses “visualize” the massive amount of online data they generate—transform all those words and numbers into charts and graphics their data scientists can more readily digest—but Project Elastic is something different. It’s not meant for big businesses. It’s meant for everyone.

The idea is that, when someone emails a spreadsheet to your iPad, the app will open it up—but not as a series of rows and columns. It will open the thing as chart or graph, and with a swipe of the finger, you can reformat the data into a new chart or graph. The hope is that this will make is easier for anyone to read a digital spreadsheet—an age-old computer creation that’s still looks like Greek to so many people. “We think that seeing and understanding your data is a human right,” says Story, the Tableau vice president in charge of the project.

And Story isn’t the only one. A startup called ChartCube has developed a similar tool that can turn raw data into easy-to-understand charts and graphs, and just this week, the new-age publishing outfit Medium released a tool called Charted that can visualize data in similar ways. So many companies aim to democratize access to online data, but for all the different data analysis tool out on the market, this is still the domain of experts—people schooled in the art of data analysis. These projects aim to put the democracy in democratize.

Two quick thoughts:

1. I understand the impulse to create charts and graphs that communicate patterns. Yet, such devices are not infallible in themselves. I would suggest we need more education in interpreting and using the information from infographics. Additionally, this might be a temporary solution but wouldn’t it be better in the long run if more people know how to read and use a spreadsheet?

2. Interesting quote: “We think that seeing and understanding your data is a human right.” I have a right to data or to the graphing and charting of my data? This adds to a collection of voices arguing for a human right to information and data.

Flawed pie chart with too many categories, unhelpful colors

AllMusic had a recent poll asking readers about their favorite Beatles album. Interesting topic but the pie chart used to display the results didn’t work out so well:

 

http://infogr.am/beatles-poll-results?src=web

Two main complaints:

1. There are a lot of categories to represent here:14 different albums. While it is relatively easy to see some of the larger categories, it gets more difficult to judge the proportions of the smaller categories.

2. There are some categories clearly bigger than others but the color scene seems to have little to do with the actual album title. The palette runs from black to light gray but it does not appear to be in any order. For example, they might have used the same palette but light gray would have been Please Please Me while the darkest color could have been Past Masters. As it currently stands, the reader has to pick out the category and then try to figure out where it is in the key.

Given this comes from an app intended to help create infographics, this one isn’t so great as it suffers from two issues – lots of categories and a limited color design – that I often warn my statistics students about when using pie charts.

Census data visualization: metropolitan population change by natural increase, international migration, and domestic migration

The Census regularly puts together new data visualizations to highlight newly collected data. The most recent visualization looks at population change in metropolitan areas between 2010-2011 and breaks down the change by natural increase, international migration, and domestic migration.

Several trends are quickly apparent:

1. Sunbelt growth continues at a higher pace and non-Sunbelt cities tend to lose residents by domestic migration.

2. Population increases by international migration still tends to be larger in New York, Los Angeles, and Miami.

3. There are some differences in natural increases to population. I assume this is basically a measure of birth rates.

However, I have two issues with this visualization. My biggest complaint is that the boxes are not weighted by population. New York has the largest natural increase to the population but it is also the largest metropolitan areas by quite a bit. A second issue is that the box sizes are not all the 50,000 or 10,000 population change as suggested by the key at the top. So while I can see relative population change, it is hard to know the exact figures.