Misinterpreting a graph of income in the US by misreading the X-axis categories

Some graphs can be more difficult to interpret, particularly if the categories along one of the axes are not a consistent width. Here is an example: misreading a chart of income in the United States:

“When I was growing up in Canada,” says Jon Evans of Techcrunch, “I was taught that income distribution should and did look like a bell curve, with the middle class being the bulge in the middle. Oh, how naïve my teachers were. This is how income distribution looks in America today.”

file

“That big bulge up above? It’s moving up and to the left. America is well on the way towards having a small, highly skilled and/or highly fortunate elite, with lucrative jobs; a vast underclass with casual, occasional, minimum-wage service work, if they’re lucky; and very little in between.”…

Er, no.  Look closely at those last two brackets.   Now look at the brackets immediately to the right of them? What do you notice?

Probably, you notice the same thing that immediately struck me: the last two brackets cover a much, much wider income band than the rest of the brackets on the graph.

Each bar on that graph represents a $5,000 income band: Under $5,000, $5000 to $9,999, and so forth.  Except for the last two.  The penultimate band is $200,000 to $250,000, which is ten times as wide as the previous band.  And the last bar represents all incomes over $250,000–a group that runs from some law associate who pulled down $251,000 last year, through A-Rod’s $27 million annual salary, all the way to some Silicon Valley superstar who just cashed out the company for a one time windfall of hundreds of millions of dollars.  Unsurprisingly, much wider bands have more people in them than they would if you kept on extrapolating out in $5,000 increments…

To put it another way, the apparent clustering of income along the rich right tail of the distribution is just an artifact of the way that the Census presents the data.  If they kept running through $5,000 brackets all the way out to A-Rod, the spreadsheet would be about a mile long, and there would only be a handful of people in each bracket.  So at the high end, where there are few households, they summarize.

The Census likely has good reasons for reporting these higher-income categories in such a way. First, because there are relatively fewer people in each $5,000 increment, they are trying to not make the graph too wide. Second, I believe the Census topcodes income, meaning that above a certain dollar point, incomes don’t get any higher. This is done to help protect the identity of these respondents who might be easy to pick out of the data otherwise.

But, this is a classic misinterpretation of a graph. As McArdle notes, this is a long-tail graph with very few people at the top end. The graph tries to alert reader to this by also marking some of the notable percentiles; above the $130,000 to $134,999 category, it reads “The top 10 percent reported incomes above $135,000” and above the top two categories, it reads, “approximately 4 percent of households.” Making the right interpretation depends not just on the relative shape of the graph, bell curve or otherwise, but looking closely at the axes and categories.

Leave a comment