Many Americans not optimistic that their childrens’ lives will be better

A recent Bloomberg poll asked Americans whether they felt the future would be better for their children. The results:

What optimism there is about the immediate future doesn’t carry over to the longer term. Pluralities of those polled say they’re not hopeful they will have enough money in retirement and expect they will have to keep working to make up the difference. More than 50 percent aren’t confident or are just somewhat confident their children will have better lives than they have.

This belief has been an important part of the American Dream for decades. American parents seem willing to sacrifice much for their children to help insure this. Americans are usually quite optimistic about the future and tend to believe American ingenuity and progress will lead the way.

A question I would like to ask on these surveys: would it be okay for your children to have the same quality of life as you have experienced? If not, why not?

(A note about the reporting: many numerical statistics from the survey are thrown out. However, there is little context. The author tries to throw in some commentary about how these statistics link up with what is going on in the country or with a few quotes but this doesn’t add much. Additionally, let’s break down the numbers a bit more: do they differ by gender, race, region, political party, etc.?)

A word cloud as an accurate information graphic

There are many ways to visually present data or statistics. One issue can arise when parts of graphs or images are not displayed in the correct proportions. Does using a word cloud fall into these difficulties?

Gallup has put together a word cloud of American’s perceptions about the federal government. Some phrases, such as “too big” and “corrupt” are much bigger. Some words are on their sides such as “good” and “terrible.”

Overall, I would say the word cloud is probably not the best choice in this situation. It is hard to judge the most popular responses and the relative proportions of each response. While one can quickly pick up that the majority of responses were negative, it is not a very precise graphic.

Reviewing “American Grace”: it is readable!

The book American Grace: How Religion Divides and Unites Us was released this past week. In addition to being co-authored by Robert Putnam (author of well-known Bowling Alone), the study has been hailed by several sources as a (and perhaps the) comprehensive look at religion in American society.

But a feature of a positive review written by a historian in the San Francisco Chronicle struck me as intriguing:

Among the great virtues of this volume is its combination of two features that are all too rarely found in close proximity. One is a commitment to the most rigorous standards of contemporary social science, bolstered by statistical sophistication. Do you like multiple regression analysis? You’ll find lots of it here. The other feature is a commitment to get their message across to educated readers who are put off by the excessive jargon and abstraction of most sociological studies. Only such a combination could make a 673-page tome worth the attention “American Grace” deserves.

Reading between the lines, here is what is being said: sociologists are not often able to combine statistical evidence (regression analysis of survey results is the gold standard for studies like this that claim to be comprehensive looks at American society) and winsome writing. Essentially, the book is “readable.”

A few thoughts come to mind:

1. What exactly about it makes it “readable” or “understandable”?

2. When reading a book using regression analysis, how much should the “typical educated reader” know about this kind of analysis? This might say more about general statistical knowledge, even among the educated, than it does about the book.

3. This is a valid concern for a book that hopes to be read by many people – writers should always consider their audience. However, it still strikes me as a lower-level priority: isn’t the argument of the book much more important than how it was written? The style of writing can detract from the argument but what we should grapple with are Putnam and Campbell’s conclusions.

Making the case for reputational rankings

A statistician argues that the National Research Council’s study of doctoral programs released earlier this week should have included reputational rankings:

Mr. Stigler says that it was a mistake for the NRC to so thoroughly abandon the reputational measures it used in its previous doctoral studies, in 1982 and 1995. Reputational surveys are widely criticized, he says, but they do provide a check on certain kinds of qualitative measures. When the new NRC counts faculty publication rates, it does not offer any information about whether scholars in the field believe those publications are any good. (That’s especially true in humanities fields, where the NRC report does not include citation counts.)

“Everybody involved in this was trying hard, and with good intentions and high integrity,” Mr. Stigler says. “But once they decided to rule out reputation, they cut off what I consider to be the most useful measure from all past surveys.”

In an e-mail message to The Chronicle this week, Mr. Ostriker declined to reply to Mr. Stigler’s specific statistical criticisms. But he pointed out that the National Academies explicitly instructed his committee not to use reputational measures.

I was curious about this when I looked at the list of sociology doctoral programs. Perhaps several of the schools that were lower than I expected, such as the University of California – Berkeley, were lower because of this.

Stigler defends reputational measures but I’ve seen others argue that they prohibit “true” rankings within fields because certain schools retain a reputation even without the necessary output (research, good grad students, etc.). This particular discussion is part of a larger one where it will need to be decided whether reputational rankings should be used or not.

“Most romantic ‘L’ station” analysis with faulty generalization

Among other features, Craigslist offers a “missed connections” page where people can try to identify and track down people they ran into in public life. Based on this data, Craigslist recently released a list of the “most romantic” spots in Chicago’s CTA system:

Turns out it’s Belmont. The stop on the CTA’s Red, Brown and Purple lines won the title of Chicago’s most romantic ‘L’ station in a report the Web site released Thursday. The crown for most romantic train line went to the Red Line.

The site did a four-week study this summer of more than 250 missed connections postings (read, “potential hookups”) in Chicago and ranked stations based on a scale called the Train Romance Index Score Total — or TRIST for short.

The TRIST is calculated by dividing the number of missed connections that mention a CTA station or line by the number of riders a year who use that station or line. Then, that number is multiplied by 10 to get a whole number and rounded to two decimal places.

The romantic train line is defined as having the best odds of a passenger spotting another rider across a crowded train or platform and then posting a missed connections listing to get in touch.

The data is limited (a pretty small sample) but the generalization is the biggest issue: does this really reveal what is the most romantic spot on the CTA rail system? It is probably much more indicative of who uses Craigslist (young North-siders?).

My guess is that this was simply meant to be fun and promote Craigslist. But sometimes statistics and arguments like this take on a life of their own…

Sorting the good from the bad statistics about Evangelicals

Sociologist Bradley Wright talks with Christianity Today about his latest book: Christians Are Hate-Filled Hypocrites…and Other Lies You’ve Been Told: A Sociologist Shatters Myths From the Secular and Christian Media. Here is CT’s quick summary of the argument:

Young people are not abandoning church. Evangelical beliefs and practices get stronger with more education. Prayer, Bible reading, and evangelism are up. Perceptions about evangelicals have improved dramatically. The data are clear on these matters, says University of Connecticut sociologist Bradley Wright, but evangelicals still want to believe the worst statistics about themselves.

One question to then ask is why Evangelicals buy into these negative statistics. The subculture argument, when applied to evangelicals, might suggest that these numbers help keep people fired up by reminding them that the group could lose its distinctiveness if drastic action is not taken.

Wright suggests his goal is to encourage Evangelicals:

This is not a call for complacency but for encouragement. Why not say, “We’re reading our Scriptures more than most other religious traditions; let’s do even better”? Instead, what we hear is, “Christianity’s going to fail. You’re all a bunch of failures. But if you buy my book, listen to my sermon, or go to my conference, I’ll solve everything.” These fear messages demoralize people, hinder the message of the church, and hide real problems.

I would like to see exactly what statistics he looks at and debunks. Wright is not the first to suggest Evangelicals have some issues with statistics.

The online “mega-reviewers”

One of the innovations of online stores is the ability for users to rate what they like and then for other users to base decisions or comment on those previous ratings. A site like Amazon.com is amazing in this regard; within a few minutes, a reader can get a much better idea about a product.

But statistics from Netflix, another site that allows user reviews, indicate that many users don’t rate anything while there is a small percentage of people who might be called “mega-reviewers”:

About a tenth of one percent (0.07%) of Netflix users — more than 10,000 people —  have rated more than 20,000 items. And a full one percent, or nearly 150,000 Netflixers, have rated more than 5,000 movies. By contrast, only 60 percent of Netflix users rate any movies at all, and the typical person only gives out 200 starred grades.

This rating pattern might fit a Poisson or a negative binomial regression where many people rate none or very few movies while there is a smaller percentage who rate a lot. (A useful statistic in helping to figure out the shape of the curve: while there is 40% that doesn’t rate anything, of the 60 percent who rate any movies at all, what is that median?)

The Atlantic talks to two of mega-reviewers who seem to motivated by seeing what the system would recommend to them after having all of their input. Interestingly, they suggest Netflix still recommends movies to them that they don’t like after watching them.

Fighting innumeracy and “proofiness”

A new book by journalist Charles Seife examines how figures and statistics are poorly used in public debates. I like his idea of “proofiness” which seems similar to the concept of “truthiness.” Here are some of the types of bad statistics he points out:

Falsifying numbers is the crudest form of proofiness. Seife lays out a rogues’ gallery of more subtle deceptions. “Potemkin numbers” are phony statistics based on erroneous or nonexistent calculations. Justice Antonin Scalia’s assertion that only 0.027 percent of convicted felons are wrongly imprisoned was a Potemkin number derived from a prosecutor’s back-of-the-envelope estimate; more careful studies suggest the rate might be between 3 and 5 percent.

“Disestimation” involves ascribing too much meaning to a measurement, relative to the uncertainties and errors inherent in it. In the most provocative and detailed part of the book, Seife analyzes the recounting process in the astonishingly close 2008 Minnesota Senate race between Norm Coleman and Al Franken. The winner, he claims, should have been decided by a coin flip; anything else is disestimation, considering that the observed errors in counting the votes were always much larger than the number of votes (roughly 200 to 300) separating the two candidates.

“Comparing apples and oranges” is another perennial favorite. The conservative Blue Dog Democrats indulged in it when they accused the Bush administration of borrowing more money from foreign governments in four years than had all the previous administrations in our nation’s history, combined. True enough, but only if one conveniently forgets to correct for inflation.

Books like these are needed in our society as politicians often debate through numbers. Without a proper understanding of who is using these numbers, where they come from, and what they mean, the public will have difficulty understanding what is going on. (And this may be the aim of politicians.)

(Based on this review, his arguments and concepts seem similar to those of sociologist Joel Best.)

Quick Review: Stat-Spotting

Sociologist Joel Best has recently done well for himself by publishing several books about the misuse of statistics. This is an important topic: many people are not used to thinking statistically and have difficulty correctly interpreting statistics even though they are commonly used in media stories. Best’s most recent book on this subject, published in 2008, is Stat-Spotting: A Field Guide to Identifying Dubious Data. A few thoughts on this text:

1. One of Best’s strong points is that his recommendations are often based in common-sense. If a figure strikes you as strange, it probably is. He has tips about keeping common statistical figures in your mind to help keep sense of certain statistics. Overall, he suggests a healthy skepticism towards statistics: think about how the statistic was developed and who is saying it.

2. When the subtitle of the book says “field guide,” it means a shorter text that is to the point. Best quickly moves through different problems with statistical data. If you are looking for more thorough explanations, you should read Best’s 2001 book Damned Lies and Statistics. (A cynical reader might suggest this book was simply a way to make more money of topics Best has already explored elsewhere.)

3. I think this text is most useful for finding brief examples of how to analyze and interpret data. There are numerous examples in here that could start off a statistics lesson or could further illustrate a point. The examples cover a variety of topics and sources.

This is a quick read that could be very useful as a simple guide to combating innumeracy.

Trying to figure out why crime rates are down

Crime rates are down but experts are having difficulty figuring out exactly why:

There are no neat answers. Among the theories: As overall economic activity slows, more people who otherwise would be at work are unemployed and at home, and when they do travel they are not as likely to carry items of value, so burglaries and street robberies decline.

In the 1970s and early 1980s, when the economy went south crime rates went up. Inflation was high then, low now. Is that the difference? For the experts, it’s back to the drawing board.

A couple of thoughts:

1. In a large system like American society, it can be very difficult to isolate individual or even small groups of factors that are causing the downward trend in crime. Some might take this as evidence that social scientists can’t figure anything out about society. I would suggest that it simply illustrates how complex social life can be.

2. Perhaps like the economy, politicians will get credit for crime going down and get blamed if crime goes up even if policies had little known effect on these changes.

3. Across American society, do the American people perceive that crime has gone down? While the statistics say it has, do people feel safer? This is an issue of how crime is portrayed and whether individuals accept these societal-level figures (if they even ever see them) over anecdotal evidence.