A company offers to replicate research study findings

A company formed in 2011 is offering a new way to validate the findings of research studies:

A year-old Palo Alto, California, company, Science Exchange, announced on Tuesday its “Reproducibility Initiative,” aimed at improving the trustworthiness of published papers. Scientists who want to validate their findings will be able to apply to the initiative, which will choose a lab to redo the study and determine whether the results match.

The project sprang from the growing realization that the scientific literature – from social psychology to basic cancer biology – is riddled with false findings and erroneous conclusions, raising questions about whether such studies can be trusted. Not only are erroneous studies a waste of money, often taxpayers’, but they also can cause companies to misspend time and resources as they try to invent drugs based on false discoveries.

This addresses a larger concern about how many research studies found their results by chance alone:

Typically, scientists must show that results have only a 5 percent chance of having occurred randomly. By that measure, one in 20 studies will make a claim about reality that actually occurred by chance alone, said John Ioannidis of Stanford University, who has long criticized the profusion of false results.

With some 1.5 million scientific studies published each year, by chance alone some 75,000 are probably wrong.

I’m intrigued by the idea of having an independent company assess research results. This could work in conjunction with other methods of verifying research results:

1. The original researchers could run multiple studies. This works better with smaller studies but it could be difficult when the N is larger and more resources are needed.

2. Researchers could also make their data available as they publish their paper. This would allow other researchers to take a look and see if things were done correctly and if the results could be replicated.

3. The larger scientific community should endeavor to replicate studies. This is the way science is supposed to work: if someone finds something new, other researchers should adopt a similar protocol and test it with similar and new populations. Unfortunately, replicating studies is not seen as being very glamorous and it tends not to receive the same kind of press attention.

The primary focus of this article seems to be on medical research. Perhaps this is because it can affect the lives of many and involves big money. But it would be interesting to apply this to more social science studies as well.

Century 21 says winning NFL teams boost housing prices

A new study from Century 21suggests housing values rise when NFL teams win:

The question was this: What is the impact on a city when the hometown team does well or doesn’t do well? Century 21 looked at teams’ successes, population growth from census numbers, home value appreciation and attendance rates. And the correlation between on-the-field success and real estate prices was evident:Four of the five cities with teams that went from a losing record in 2010 to a winning record in 2011 saw average home sales prices increase between 2010 and 2011.

After winning the Super Bowl, Green Bay, Wis., saw a population growth of 1.7 percent in 2011, compared with runner-up Pittsburgh’s 0.6 percent growth.

Going from a record of 10-6 in 2010 to 2-14 in 2011, Indianapolis, the home of the Colts, saw a 19.8 percent decrease in home sales.

Eight of the nine cities with a team that had attendance rates of 100 percent or more in 2011 saw average home sales prices rise that year.

Here is the original Century 21 blog post with this information.

The NFL is a powerful entity but does it have this much power? Is this due to a small sample size (this article mentions only one year of data)? Are there other factors behind this correlation? If I had to guess at what is going on here, I suspect this is too small of a sample and that 2011 prices in certain cities happened to coincide with NFL results. Why not look at the housing crisis years and see the relationship between records and housing values?

I’m generally skeptical of sports fans and others that claim sports are important for the civic pride of a community or that new stadiums need to be funded by taxpayers because the loss of a team will hurt the local economy. However, this could be pure genius from Century 21. What better way to boost business than to hook your services to the popular NFL? Hey, there was even a Century 21 2012 Super Bowl ad!

Using the newer measure of population-weighted density

Richard Florida writes about how the Census Bureau is using a new measure of population density:

A new report from the U.S. Census Bureau helps to fill the gap, providing detailed estimates of different types of density for America’s metros. This includes new data on “population-weighted density” as well as of density at various distances from the city center. Population-weighted density, which essentially measures the actual concentration of people within a metro, is an important improvement on the standard measure of density. For this reason, I like to think of it as a measure of concentrated density. The Census calculates population-weighted density based on the average densities of the separate census tracts that make up a metro.

The differences in the two density measures are striking. The overall density across all 366 U.S. metro areas is 283 people per square mile. Concentrated or population-weighted density for all metros is over 20 times higher, at 6,321 people per square mile.

This Census report is not the first to use population-weighted density. A 2001 study by Gary Barnes of the University of Minnesota developed such a measure to examine sprawl and commuting patterns. In 2008, Jordan Rappaport of the Kansas City Fed published an intriguing study in the Journal of Urban Economics (non-gated version here), which looked at the relationship between density (including population-weighted density) and the productivity of regions. Christopher Bradford, who blogs at his Austin Contrarian, has also advocated for using population-weighted density to better understand urban development…

New York and Los Angeles are good examples of the differences between these two density measures. While they are close in the average density — 2,826 for New York versus 2,646 for L.A. — the New York metro has much higher levels of concentrated or population-weighted density, 31,251 versus 12,114 people per square mile. San Francisco, which has lower average density than L.A. (1,755 people per square mile), tops L.A. on population-weighted density with 12,145 people per square mile.

It sounds like the new density measure uses the average densities of Census tracts which then limits the effect of sprawl as these less dense tracts, of which there are necessarily more in burgeoning metropolitan regions, are averaged out by the denser tracts. In other words, the effects of sprawl are less pronounced in this newer measure.

This reminds me of an interesting density fact: if you use the basic measure of density (total population of metro land divided by land in the metro area), the Los Angeles metro region has a higher density than New York City. But, of course, New York City is much more dense at its core while LA is more known for its sprawl.

Politicians trying to woo the ambigiously defined middle class

Amidst an election cycle where all sides want to woo the middle class, several researchers suggest that providing an exact definition of the middle class is difficult:

“You can’t define middle class, but you can ask people, ‘Do you still feel middle class?’ And more and more people don’t,” said Tim Smeeding, director of the Institute for Research on Poverty at the University of Wisconsin…

“The whole attraction of middle class … is it doesn’t mean anything,” said Dennis Gilbert, a sociology professor at Hamilton College who studies class issues. “Middle class means anybody who might vote for you.”…

Still, experts say the term middle class has a cultural connotation that goes beyond the number on your paycheck or tax stub.

Kevin Leicht, director of the Iowa Social Science Research Center at the University of Iowa, said many Americans think of a middle-class life as being one in which you have a stable job, own your own home and occasionally buy something substantial like a new car. You also either went to college or have the aspiration of sending your children to college.

I would disagree with Gilbert and agree with Leicht and Smeeding. When asked, Americans do tend to feel they are middle class, the recent economic crisis notwithstanding. The middle class in America is more of an idea than a clearly-defined category that people move in and out of. Cultural categories can be powerful, perhaps even more so than economic realities.

Recently, the Brookings Institution defined six likely life stages a middle-class person goes through and in 2010, a government task force tied being middle class to six outcomes. It is not impossible to set such criteria for measurement purposes but they do not match up with everyone who would call themselves middle class.

Speaking of politicians looking for middle-class votes, I haven’t seen journalists or scholars discussing how this wooing developed in American political history. How long has this wooing been taking place? Is this primarily a post-World War II phenomenon or does it have a longer history? I wonder if the middle class only matters here because it is in this period of history that politicians think there are a large number of voters to be swayed in this category…

Argument: still need thinking even with big data

Justin Fox argues that the rise of big data doesn’t mean we can abandon thinking about data and relationships between variables:

Big data, it has been said, is making science obsolete. No longer do we need theories of genetics or linguistics or sociology, Wired editor Chris Anderson wrote in a manifesto four years ago: “With enough data, the numbers speak for themselves.”…

There are echoes here of a centuries-old debate, unleashed in the 1600s by protoscientist Sir Francis Bacon, over whether deduction from first principles or induction from observed reality is the best way to get at truth. In the 1930s, philosopher Karl Popper proposed a synthesis, in which the only scientific approach was to formulate hypotheses (using deduction, induction, or both) that were falsifiable. That is, they generated predictions that — if they failed to pan out — disproved the hypothesis.

Actual scientific practice is more complicated than that. But the element of hypothesis/prediction remains important, not just to science but to the pursuit of knowledge in general. We humans are quite capable of coming up with stories to explain just about anything after the fact. It’s only by trying to come up with our stories beforehand, then testing them, that we can reliably learn the lessons of our experiences — and our data. In the big-data era, those hypotheses can often be bare-bones and fleeting, but they’re still always there, whether we acknowledge them or not.

“The numbers have no way of speaking for themselves,” political forecaster Nate Silver writes, in response to Chris Anderson, near the beginning of his wonderful new doorstopper of a book, The Signal and the Noise: Why So Many Predictions Fail — But Some Don’t. “We speak for them.”

These days, finding and examining data is much easier than before but it is still necessary to interpret what these numbers mean. Observing relationships between variables doesn’t necessarily tell us something valuable. We also want to know why variables are related and this is where hypotheses come in. Careful hypothesis testing means we can rule out spurious associations, other variables that may be leading to the observed relationship, and look for the influence of one variable on another when controlling for other factors (the essence of regression) or looking at more complex models where we can see how a variety of models affect each other at the same time.

Also, at the opposite end of the scientific process from the hypotheses, utilizing findings when creating and implementing policies will also require thinking. Once we have established that relationships likely exist, it takes even more work to respond to this in useful and effective ways.

Political operative discusses which polls he thought were reliable, unreliable while working for Edwards 2008 campaign

Amidst discussions of whether current polls are accurately weighting their samples for Democrats and Republicans, a former political operative for Al Gore and John Edward talks about how the Edwards campaign used polls:

However, under cross-examination by lead prosecutor David Harbach, Hickman acknowledged sending a series of emails in November and December, and even into January, endorsing or promoting polls that made Edwards look good. Asked about what appeared to be a New York Times/CBS poll released in mid-November showing an effective “three-way tie” in Iowa with Hillary Clinton at 25 percent, Edwards at 23 percent and Obama at 22 percent, Hickman acknowledged he circulated it but insisted he didn’t think it was correct.

“The business I’m in is a business any fool can get into, and a lot can happen. I’m sure there was a poll like that,” the folksy Hickman told jurors when first asked about a poll showing the race tied. “I kept up with every poll that was done, including our own, and there may have been a few that showed them a tie, but… that’s not really what my analysis is. Campaigns are about trajectory, and… there could have been a point at which it was a tie in the sense that we were coming down, and Obama was going up, and Clinton was going up.”

Hickman also indicated that senior campaign staffers knew many of the polls were poorly done and of little value. “We didn’t take these dog and cat and baby-sitter polls seriously,” he said.

Hickman acknowledged that on January 2, 2008, a day before the Iowa caucuses, he sent out a summary of nine post-Christmas Iowa polls showing Edwards in contention in the Hawkeye State. However, he testified two-thirds of them were from firms he considered “ones we typically would not put a lot of credence in.” Hickman put Mason-Dixon, Strategic Vision, Insider Advantage, Zogby and Research 2000 in the “less reputable” group. He also told the court that ARG polls “have a miserable track record.”

Hickman said he considered the Des Moines Register polls, CNN and Los Angeles Times polls more accurate.

This seems like typical politics: an operative is supposed to spin the best news they can about their candidate, even if they don’t think this is the whole story. However, it is fascinating to see his opinion of different polling organizations. I wish he went on to describe why some of these polls were better than others: better samples, more reliable and/or predictive results, they lined up with other reputable polls? At the same time, I think the DrudgeReport’s headline for this story, “Under oath, Edwards pollster admits polls were ‘propaganda,'” is a bit misleading.  Hickman wasn’t disparaging all polls; he was admitting to using some polls that he thought were inaccurate to tell a particular political story.

If we got a bunch of current political operatives in a room, here are questions we could ask that would revealing:

1. Are there certain polls that you all consider to be reliable? (I hope the answer is yes. But I would also guess that each political party thinks certain polls tend to lean in their direction.)

2. What information do you all work with regularly that helps give you a better picture of what is going beyond the polls? In other words, the American public doesn’t get much of an inside view while the campaign is happening beyond a stream of polls reported by the media but the campaigns themselves have more information that matters. How much should the public pay attention to these polls or can they pick up clues from what is really going on elsewhere? (The media seems to like polls but there are other ways to get information.)

3. In the long run, who is helped or harmed by having a lot of polling organizations? Hickman suggests some polls aren’t that worthwhile so if this is the case, should they not be reported to the American public? (Americans can look at a variety of polls; should there be that many to choose from?)

Unfortunately, this story feeds a growing mistrust of polls. Generally, it is not good for social science if 42% of Americans think polls are biased for one candidate or another. On one hand, these 42% may simply not like what the polls are reporting, have little idea how polls work, and simply want their candidate to win (and won’t like the polls until this happens). On the other hand, perceptions matter and decisions about polls should be made on scientific grounds, not on ideological or partisan affections. And, surely this has to play into the finding that only 9% of Americans are willing to respond to telephone surveys.

Pew Research: the response rate for a typical phone survey is now 9% and response rates are down across the board

Earlier this year, Pew Research described a growing problem for pollsters: over 90% of the  public that doesn’t want to participate in telephone surveys.

It has become increasingly difficult to contact potential respondents and to persuade them to participate. The percentage of households in a sample that are successfully interviewed – the response rate – has fallen dramatically. At Pew Research, the response rate of a typical telephone survey was 36% in 1997 and is just 9% today.

The general decline in response rates is evident across nearly all types of surveys, in the United States and abroad. At the same time, greater effort and expense are required to achieve even the diminished response rates of today. These challenges have led many to question whether surveys are still providing accurate and unbiased information. Although response rates have decreased in landline surveys, the inclusion of cell phones – necessitated by the rapid rise of households with cell phones but no landline – has further contributed to the overall decline in response rates for telephone surveys.

A new study by the Pew Research Center for the People & the Press finds that, despite declining response rates, telephone surveys that include landlines and cell phones and are weighted to match the demographic composition of the population continue to provide accurate data on most political, social and economic measures. This comports with the consistent record of accuracy achieved by major polls when it comes to estimating election outcomes, among other things.

This is not to say that declining response rates are without consequence. One significant area of potential non-response bias identified in the study is that survey participants tend to be significantly more engaged in civic activity than those who do not participate, confirming what previous research has shown. People who volunteer are more likely to agree to take part in surveys than those who do not do these things. This has serious implications for a survey’s ability to accurately gauge behaviors related to volunteerism and civic activity. For example, telephone surveys may overestimate such behaviors as church attendance, contacting elected officials, or attending campaign events.

Read on for more comparisons between those who do tend to participate in telephone surveys and those who do not.

This has been a growing problem for years now: more people don’t want to be contacted and it is more difficult to contact cell phone users. One way this might be combated is to offer participants small incentives. This is already done with some online panels and it is more commonly used in mail surveys. These incentives wouldn’t be large enough to sway opinion or perhaps just get a sample of people who want the incentive but would be enough to raise response rates. It could be thought of as just enough to acknowledge and thank people for their time. I don’t know what the profit margins of firms like Gallup or Pew are but I imagine they could offer these small incentives quite easily.

This does suggest that the science of weighting is increasingly important. Having government benchmarks is really important, hence, the need for updated Census figures. However, it is not inconceivable that the Census could be scaled back: this is often a conservative proposal either based on the money spent on the Census Bureau or the “invasive” questions asked. And, it also may make the Census even more political as years of polling might be dependent on getting the figures “right,” depending on what side of the political aisle one is one.

Brookings: who reaches middle-class affected by race, family’s social class, gender

A new report from the Brookings Institution examines who makes it to the middle class through achieving a number of benchmarks. A summary of the findings:

The study breaks life down into stages (for instance, adolescence) and gives benchmarks for each of those stages (in that case, graduation from high school with a grade-point average above 2.5, no criminal convictions and no involvement in a teenage pregnancy).

They then studied children over time, analyzing whether they met those benchmarks and projecting whether they would make it to the middle class — defined as the top three quintiles of income — by age 40.

Unsurprisingly, the researchers found that success seems to beget success — meeting each benchmark makes one more likely to meet the next. Moreover, the effect accumulates. A child who meets all the criteria from birth to adulthood has an 81 percent chance of being middle class. A child who meets none has only a 24 percent chance…

Race matters as well. About two in five black adolescents met the benchmark of graduating from high school with a decent grade point average, no children and no criminal record by the age of 19. About two in three white adolescents did.

And from the introduction of the Brookings report:

The reality is that economic success in America is not purely meritocratic. We don’t have as much equality of opportunity as we’d like to believe, and we have less mobility than some other developed countries. Although cross-national comparisons are not always reliable, the available data suggest that the U.S. compares unfavorably to Canada, the Nordic countries, and some other advanced countries. A recent study shows the U.S. ranking 27th out of 31 developed countries in measures of equal opportunity.

People do move up and down the ladder, both over their careers and between generations, but it helps if you have the right parents. Children born into middle-income families have a roughly equal chance of moving up or down once they become adults, but those born into rich or poor families have a high probability of remaining rich or poor as adults. The chance that a child born into a family in the top income quintile will end up in one of the top three quintiles by the time they are in their forties is 82 percent, while the chance for a child born into a family in the bottom quintile is only 30 percent. In short, a rich child in the U.S. is more than twice as likely as a poor child to end up in the middle class or above.

This shouldn’t be too surprising: despite the American cultural emphasis on working hard and getting ahead (a story told by both political parties at their 2012 conventions), certain traits increase the likelihood of achieving a middle-class life. Hard work only goes so far; other social factors such as family background, race, and gender make a difference.

I am intrigued by how the report defines the middle-class life stages as defined by the Social Genome Model (p.3-4 of the report):

1. Family Formation. Born at normal birth weight to a non-poor, married mother with at least a high school diploma.

2. Early childhood. Acceptable pre-reading and math skills AND behavior generally school-appropriate.

3. Middle childhood. Basic reading and math skills AND Social-emotional skills.

4. Adolescence. Graduates from high school w/GPA >= 2.5 AND Has not been convicted of a crime nor become a parent.

5. Transition to adulthood. Lives independently AND Receives a college degree or has a family income >= 250% of the poverty level.

6. Adulthood. Reaches middle class (family income at least 300% of the poverty level).

Why exactly these stages?

Improving the word cloud: NYT adds rates of word usage and comparisons between groups

I’m generally not a big fan of word clouds but one of students recently pointed out to me an example from the New York Times that makes some improvements: looking at the rates of word usage at both the Republican and Democratic National Conventions. (Click through to see the interactive graphic.) Here is how I think this improves on a typical word cloud:

1. It doesn’t display word frequency but rather the rate of the word usage. Thus, we get an idea of how often the words were used in comparison to all the words that were said. Frequencies by themselves don’t tell you much but this helps put them into a context. (A note: I would like the graphic to include the total word usage for each convention so we have a quick idea of how many words were spoken).

2. The display also makes a comparison between the two political parties so we can see the relative word usage across two groups. This could run into the same problem as frequencies – just because one group uses the term more doesn’t necessarily mean they think it is more important – but we can start getting some clues into the differences in how Republicans and Democrats made a case for their party.

Overall, this is an improvement over the typical word cloud (make your own at wordle.net) and helps us start analyzing the tens of thousands of words spoken at the conventions. Of course, we would need a more complete analysis, probably including multiple coders, to really get at what was conveyed through the words (and that doesn’t even get at the visuals, body language, presentation).

A Costa Rican explains why the country’s #1 ratings in the Happiness Index is due to its culture

The Happy Planet Index puts Costa Rica at number one in the world. A Costa Rican first describes what makes up the index and then how Costa Rican culture led to the top ranking:

Have you ever heard of the Happy Planet Index? As a Costa Rican, I hear about it quite a lot. Both the HPI, a project of the New Economics Foundation, and the lesser-known World Database of Happiness, assembled by a Dutch sociologist, put Costa Rica at the top of the rankings. This officially makes Costa Rica the most content country on the planet. (For once, we’re first in the world at something other than potholes per capita.)

The HPI is calculated from a combination of three factors: life expectancy, self-reported well-being, and ecological footprint. Thus, according to its own website, the HPI measures “how many long and happy lives [countries] produce per unit of environmental input.” That sounds like a mouthful at first, but once you think it through for a bit the concept seems to make sense. Traditional measures of wellbeing, such as GDP per capita, simply measure output. They don’t take into account environmental devastation brought about by industrialization or unhappiness stemming from social or economic inequality. The HPI, on the other hand, rewards countries with healthy, satisfied citizens for living within their ecological means. Thus, the HPI tells developing countries they shouldn’t aspire to the living standards of the United States or France, but rather to the smile production of Costa Rica…

My point here is that, in Costa Rica at least, happiness seems to stem partly from culture. It’s not at all controversial from an economic viewpoint to suggest a link between happiness and culture, and this is somewhat validated by the fact that five of the top ten countries in the latest HPI ranking are located in Central America, a relatively small and homogeneous region. One of those, El Salvador, has the highest murder rate in the world, and another, Nicaragua, displays levels of poverty one would expect from a war-ravaged Sub-Saharan nation. Living in either one of those (and I have for a time, in both) actually sounds like a pretty grim prospect to me, yet the HPI would have us believe that these countries are worth emulating.

Thus, we approach the core problem with the Happy Planet Index: Happiness and wellbeing are inextricably linked, but they cannot be reduced to the same thing. If Costa Rica got its act together and built better infrastructure (even at the expense of causing a little bit of damage to the environment) our wellbeing would be much higher—we would no longer have to endure endless traffic jams brought about by rock slides or sinkholes, for instance. Yet—here’s the key—our happiness wouldn’t change that much, because it’s largely a consequence of who we are as a people. Improved infrastructure is precisely the sort of advancement that shows up in measures like GDP per capita, and which the HPI ignores completely—forms of progress that undoubtedly change us for the better, though we remain as content as ever.

I’ve written about measuring happiness before (see here and here) but I don’t remember seeing this argument before about the Happy Planet Index: it is more dependent on culture than measures of material conditions. If you carry this argument to its conclusion, then great changes for the better or worse in Costa Rica wouldn’t affect people much.

I suspect it doesn’t exactly work this way. There are probably some thresholds that would affect happiness in Costa Rica and a lot of other countries. These would be similar to findings in the US that above a certain point, having more income doesn’t really change people’s happiness or well-being. There is an interplay between culture and material conditions; Marx may have suggested that culture is derived from those who control the means of production but others, including Weber, would argue that there is more of a back and forth. If the conditions changed a lot, the culture would have to respond and might change quite a bit as well.