Interpreting the FBI’s 2009 hate crime report

Hate crime legislation is a topic that seems to rile people up. The Atlantic provides five sources that try to summarize and make sense of the latest annual data released by the FBI:

Agence France-Presse reports that “out of 6,604 hate crimes committed in the United States in 2009, some 4,000 were racially motivated and nearly 1,600 were driven by hatred for a particular religion … Blacks made up around three-quarters of victims of the racially motivated hate crimes and Jews made up the same percentage of victims of anti-religious hate crimes.” The report also notes that “anti-Muslim crimes were a distant second to crimes against Jews, making up just eight percent of the hate crimes driven by religious intolerance.” Finally, the report notes a drop in hate crimes overall: “Some 8,300 people fell victim to hate crimes in 2009, down from 9,700 the previous year.”

This is a reminder that there is a lot of data out there, particularly generated by government agencies, but we need qualified and skilled people to interpret its meaning.

You can find the data on hate crimes at the FBI website of uniform crime reports. Here is the FBI’s summary of the incidents, 6,604 in all.

Trying to explain American differences in 12 easy categories

I recently flipped through Our Patchwork Nation, a recent book that tries to explain differences in America by splitting counties into twelve types: “boom towns, evangelical epicenters, military bastions, service worker centers, campus and careers, immigration nation, minority central, tractor community, Mormon outposts, emptying nests, industrial metropolises and monied burbs.” A review in the Washington Post offers a quick overview of this genre of book:

And every few years there’s another book promising to chart the country’s divisions by splitting it into categories more telling than the 50 states. Former Washington Post writer Joel Garreau offered his “Nine Nations of North America” in 1981; two decades later came Richard Florida with “The Rise of the Creative Class,” followed by Bill Bishop’s “The Big Sort,” which sought to explain why so many of us are clustering in enclaves of the like-minded.

The latest aspiring taxonomists are Dante Chinni, a journalist, and James Gimpel, a University of Maryland government professor, who use socioeconomic data to break the country’s 3,141 counties into 12 categories.

This sort of analysis is now fairly common: there is a lot of publicly available data from the Census Bureau and many more people are now interested in looking at the United States as a whole.

I have two concerns about this data. My main complaint about this effort is how the types are developed at the county level. This may be a good level for obtaining data (easy to do from the Census Bureau) but it is debatable about whether this is a practical level for the lives of Americans. When asked where they live, most people would name a community/city first and then next a state or region before getting to a county. County rules and ordinances have limited effect in many places as municipal regulations take precedence.

A second concern is that this type of sorting or clustering tells us where places are now but doesn’t say as much about how they arrived at this point or how they might change in the future. This is a cross-sectional analysis: it tells us what American counties look like right now. This may be useful for looking at recent and upcoming trends but most of these places have deeper histories and characters than just a moniker like “monied burbs.” This would explain some of the Post’s confusion about lumping together “emptying nests” communities in the Midwest and Florida.

Interactive maps of metropolitan America

The Brookings Institute has put together a website with interactive data maps of metropolitan America. The data comes from the 2009 American Community Survey (done by the Census) and one can look at all sorts of variables across cities or metropolitan areas.

The best feature, in my opinion, is the tab that lets you compare suburbs alone across metropolitan regions. Very quickly, you can find that the suburbs of Syracuse, New York have the highest percentage of non-Hispanic whites (93.1%), the suburbs of San Jose-Sunnyvale-Santa Clara, California, have the highest median household income ($96,478), and the suburbs of Modesto, California have the highest percentage of commuters traveling more than 90 minutes to work (7.2%).

An emerging portrait of emerging adults in the news, part 3

In recent weeks, a number of studies have been reported on that discuss the beliefs and behaviors of the younger generation, those who are now between high school and age 30 (an age group that could also be labeled “emerging adults”). In a three-part series, I want to highlight three of these studies because they not only suggest what this group is doing but also hints at the consequences. A study in part one showed that there is an association between hyper-texting and hyper social networking use and risky behavior. A study in part two showed that teens and college students today are more tolerant than previous generations but less empathetic.

Another interesting aspect of the lives of emerging adults is living alone. While this is common among the middle-aged, the proportion of emerging adults living alone is growing:

The stats are arresting. In this country, approximately 31 million people live alone, and one-person households make up 28 percent of the total, tying with childless couples as the most common residential type — “more common,’’ Klinenberg pointed out, “than the nuclear family, the multigenerational family, and the roommate or group home.’’

Those who live alone are mostly middle-age, with young adults the fastest-growing segment, and there are more women than men. No longer a transitional stage, living alone is one of the most stable household arrangements. And while one-person households were once scattered in low-density rural settings, they’re now concentrated in cities. “In Manhattan,’’ he said, “more than half of all residences are one-person dwellings.’’

I’ve seen a number of commentators attempt explanations for this: this is part of becoming an adult today, television shows like Friends or How I Met Your Mother glamorized the social life in the city (though these shows tend to show roommates living together), outrageous housing costs push younger people into odd living arrangements.

But couldn’t this trend toward living alone be linked to the two prior studies we looked at? If a lot of social life occurs through texting or through social networking sites and emerging adults are more tolerant but less empathetic, then living alone makes some sense. Emerging adults still have a social life – but this social life may look quite different as friends are found and communicated with through technology or social outings rather than through closer ties (such as living together).

And what if living alone or being alone more is the outcome for younger generations? How might this impact society? Such arrangements may be good for self-actualization (or not) but there will be consequences. What will “community” look like in several decades? If these three studies were all the evidence we had, we might conclude that emerging adults like to be social but also like to keep people at an arm’s length.

It is hard to draw conclusions from three studies that are reported in the news – but here is the emerging portrait: social interaction is changing. It may be easy to dismiss this new interaction as bad or wrong but we need more information and research on this particular topic. We need more measurement of depth or quality of relationships. Out of these three studies, we have two measures of interaction quality: the prevalence of risky behaviors (though this is only an association or correlation) and levels of empathy. We could be asking other questions like how many students in college today make arrangements for single rooms in dorms or would prefer to live in single rooms? How many students who study abroad actually are able to fully understand and appreciate a new culture versus just being able to see the differences two cultures?

All of this will be interesting to watch in the coming years as emerging adults  obtain the power to shape society’s values regarding interaction and community.

Losing a data source: the white pages of the phone book

A number of phone companies have recently made requests of states that they stop publishing white pages. With this information available online and few people using the thick phone books, it looks like the phone book is on the way out. We might say “good riddance” but then briefly reflect on the usefulness of residential phone listings as data sources:

If the white pages are nearing their end, then Emily Goodmann hopes the directories would be archived for historical, genealogical or sociological purposes.

“The telephone directory stands as the original sort of information network that not only worked as kind of a social network in a sense, but it served as one of the first information resources,” said Goodmann, a doctoral student at Northwestern University who is writing her dissertation on the history of phone books as information technology. “It’s sort of heartbreaking … even though these books are essentially made to be destroyed.”

Particularly in studying communities in the late 1800s and early 1900s, phone listings can be an important source of data. In fact, they may be the only common source that lists a majority of residents.

(Interestingly, the article also notes that the Yellow Pages are doing just fine – and will continue to be printed.)

Comparing stories and statistics

A mathematician thinks about the differences between stories and statistics and the people who prefer one side over another:

Despite the naturalness of these notions, however, there is a tension between stories and statistics, and one under-appreciated contrast between them is simply the mindset with which we approach them. In listening to stories we tend to suspend disbelief in order to be entertained, whereas in evaluating statistics we generally have an opposite inclination to suspend belief in order not to be beguiled. A drily named distinction from formal statistics is relevant: we’re said to commit a Type I error when we observe something that is not really there and a Type II error when we fail to observe something that is there. There is no way to always avoid both types, and we have different error thresholds in different endeavors, but the type of error people feel more comfortable may be telling. It gives some indication of their intellectual personality type, on which side of the two cultures (or maybe two coutures) divide they’re most comfortable. I’ll close with perhaps the most fundamental tension between stories and statistics. The focus of stories is on individual people rather than averages, on motives rather than movements, on point of view rather than the view from nowhere, context rather than raw data. Moreover, stories are open-ended and metaphorical rather than determinate and literal…

I’ll close with perhaps the most fundamental tension between stories and statistics. The focus of stories is on individual people rather than averages, on motives rather than movements, on point of view rather than the view from nowhere, context rather than raw data. Moreover, stories are open-ended and metaphorical rather than determinate and literal.

This is a good discussion and one that I think about often while teaching statistics or research methods. Stories are often easy for students to grab unto, particularly if told from an interesting point of view. In the end, these stories (particularly the “classics”) have the ability to illuminate the human condition or interesting concerns but don’t have the same ability to offer more concrete overviews of the typical or common experience. Statistics do offer a different lens for viewing the world, one where individual experiences are muted in favor of data about larger groups. Both can miss important features of the reality around us but offer different angles for tackling similar concerns.

Both have their place and I would suggest both are necessary.

Claim: “Facebook knows when you’ll break up”

There is an interesting chart going around that is based on Facebook data and claims to show when people are more prone to break-up. Here is a quick description of the chart:

British journalist and graphic designer David McCandless, who specializes in showcasing data in visual ways, compiled the chart. He showed off the graphic at a TED conference last July in Oxford, England.

In the talk, McCandless said he and a colleague scraped 10,000 Facebook status updates for the phrases “breakup” and “broken up.”

They found two big spikes on the calendar for breakups. The first was after Valentine’s Day — that holiday has a way of defining relationships, for better or worse — and in the weeks leading up to spring break. Maybe spring fever makes people restless, or maybe college students just don’t want to be tied down when they’re partying in Cancun.

Potentially interesting findings and it is an interesting way to present this data. But when you consider how the data was collected, perhaps it isn’t so great. A few thoughts on the subject:

1. The best way to figure this out would be to convince Facebook to let you have the data for relationship status changes.

2. Searching for the word “breakup” and “broken up” might catch some, or perhaps even many ended relationships, but not all. Does everyone include these words when talking about ending a relationship?

3. Are 10,000 status updates a representative sample of all Facebook statuses?

4. Is there a lag time involved in reporting these changes? Monday, for example is the most popular day for announcing break-ups, not necessarily for break-ups occurring on that day. Do people immediately run to Facebook to tell the world that they have ended a relationship?

5. Does everyone initially “register” and then “unregister” a relationship on Facebook anyway?

The more I think about it, it is a big claim to make that “Facebook knows when you are going to break up” based on this data mining exercise.

Measuring the economy by looking at midnight Walmart shoppers

There are all sorts of figures and statistics that are used to measure how the economy is doing. This NPR story introduces a new metric: looking at midnight sales at Walmart on the first day of the month.

Wal-Mart noticed that sales were spiking on the first of every month. In a recent conference call with investment analysts, Wal-Mart executive Bill Simon said these midnight shoppers provide a snapshot of the American economy today.

“And if you really think about it,” Simon said, “the only reason somebody gets out and buys baby formula is they need it and they’ve been waiting for it. Otherwise, we’re open 24 hours, come at 5 a.m., come at 7 a.m., come at 10 a.m. But if you’re there at midnight you’re there for a reason.”

And so Wal-Mart has changed its stocking pattern. It brings out larger packs of items in the beginning of the month, and smaller sizes toward the end. It makes sure shelves have plenty of diapers and formula.

This is a creative data source – but we would need more information before making broad conclusions about the American economy. Do other stores experience similar spikes? How big of a spike is this? What Walmart locations have seen the biggest jumps?

It strikes me that Walmart probably possesses a treasure trove of data that would be very interesting to look at.

Quick Review: Stat-Spotting

Sociologist Joel Best has recently done well for himself by publishing several books about the misuse of statistics. This is an important topic: many people are not used to thinking statistically and have difficulty correctly interpreting statistics even though they are commonly used in media stories. Best’s most recent book on this subject, published in 2008, is Stat-Spotting: A Field Guide to Identifying Dubious Data. A few thoughts on this text:

1. One of Best’s strong points is that his recommendations are often based in common-sense. If a figure strikes you as strange, it probably is. He has tips about keeping common statistical figures in your mind to help keep sense of certain statistics. Overall, he suggests a healthy skepticism towards statistics: think about how the statistic was developed and who is saying it.

2. When the subtitle of the book says “field guide,” it means a shorter text that is to the point. Best quickly moves through different problems with statistical data. If you are looking for more thorough explanations, you should read Best’s 2001 book Damned Lies and Statistics. (A cynical reader might suggest this book was simply a way to make more money of topics Best has already explored elsewhere.)

3. I think this text is most useful for finding brief examples of how to analyze and interpret data. There are numerous examples in here that could start off a statistics lesson or could further illustrate a point. The examples cover a variety of topics and sources.

This is a quick read that could be very useful as a simple guide to combating innumeracy.