Argument that obesity and McMansions are linked

One “muckraker” tries to suggest that bigger houses – such as McMansions – make it easier for people to be obese:

No, the truth is that like cars, McMansion houses, food portions and soft drink sizes, Americans are getting bigger every day–and because it is happening everywhere, few notice. Worse, the harder we try to lose poundage with low calorie foods, fitness centers and personal trainers, the bigger we are becoming. Even people in non-industrialized countries are packing on the pounds as Big Food peddles it high calorie, addictive processed food in “new markets.”

A correlation without causation argument. And you do not have to go McMansions to make the same claim: the average size of new homes has increased from roughly 1,000 square feet to 2,500 square over sixty years. But, how might we really show that having other bigger items in our lives leads to having other bigger items in our lives? Would the reverse also be true: that if we had increasingly smaller items in our lives, we would desire smallness over all? If these are all linked, perhaps we could tie this to the big American frontier or the large American ideals at the founding of the country.

Perhaps there are other arguments to be made here. Do McMansions offer more space for people to spread out? Or, could heavier people be more likely to purchase McMansions (and is this related more to their stage in life)?

The perils of analyzing big real estate data

Two leaders of Zillow recently wrote Zillow Talk: The New Rules of Real Estate which is a sort of Freakanomics look at all the real estate data they have. While it is an interesting book, it also illustrates the difficulties of analyzing big data:

1. The key to the book is all the data Zillow has harnessed to track real estate prices and make predictions on current and future prices. They don’t say much about their models. This could be for two good reasons: this is aimed at a mass market and the models are their trade secrets. Yet, I wanted to hear more about all the fascinating data – at least in an appendix?

2. Problems of aggregation: the data is analyzed usually at a metro area or national level. There are hints at smaller markets – a chapter on NYC for example and another looking at some unusual markets like Las Vegas – but there are not different chapters on cheaper/starter homes or luxury homes. An unanswered questino: is real estate within or across markets more similar? Put another way, are the features of the Chicago market so unique and patterned or are cheaper homes in the Chicago region more like similar homes in Atlanta or Los Angeles compared to more expensive homes across markets?

3. Most provocative argument: in Chapter 24, the authors suggest that pushing homeownership for lower-income Americans is a bad idea as it can often trap them in properties that don’t appreciate. This was a big problem in the 2000s: Presidents Clinton and Bush pushed homeownership but after housing values dropped in the late 2000s, poorer neighborhoods were hit hard, leaving many homeowners to default or seriously underwater. Unfortunately, unless demand picks up in these neighborhoods (and gentrification is pretty rare), these homes are not good investments.

4. The individual chapters often discuss small effects that may be significant but don’t have large substantive effects. For example, there is a section on male vs. female real estate agents. The effects for each gender are small: at most, a few percentage points difference in selling price as well as slight variations in speed of sale. (Women are better in both categories: higher prices, faster sales.)

5. The authors are pretty good at repeatedly pointing out that correlation does not mean causation. Yet, they don’t catch all of these moments and at other times present patterns in such a way that distort the axes. For example, here is a chart from page 202:

ZillowTalkp202

These two things may be correlated (as one goes up so does the other and vice versa) but why fix the axes so you are comparing half percentages to five percentage increments?

6. Continuing #4, I supposed a buyer and seller would want to use all the tricks they can but the tips here mean that those in the real estate market are supposed to string along all of these small effects to maximize what they get. On the final page, they write: “These are small actions that add up to a big difference.” Maybe. With margins of error on the effects, some buyers and sellers aren’t going to get the effects outlined here: some will benefit more but some will benefit less.

7. The moral of the whole story? Use data to your advantage even as it is not a guarantee:

In the new realm of real estate, everyone faces a rather stark choice. The operative question now is: Do you wield the power of data to your advantage? Or do you ignore the data, to your peril?

The same is true of the housing market writ large. Certainly, many macro-level dynamics are out of any one person’s control. And yet, we’re better equipped than ever before to choose wisely in the present – to make the kinds of measured judgments that can prevent another coast-to-coast bubble and calamitous burst. (p.252)

In the end, this book is aimed at the mass market where a buyer or seller could hope to string together a number of these small advantages. Yet, there are no guarantees and the effects are often small. Having more data may be good for markets and may make participants feel more knowledgeable (or perhaps more overwhelmed) but not everyone can take advantage of this information.

University press releases exaggerate scientific findings

A new study suggests exaggerations about scientific findings – for example, suggesting causation when a study only found correlation – start at the level of university press releases.

Yesterday Sumner and colleagues published some important research in the journal BMJ that found that a majority of exaggeration in health stories was traced not to the news outlet, but to the press release—the statement issued by the university’s publicity department…

The goal of a press release around a scientific study is to draw attention from the media, and that attention is supposed to be good for the university, and for the scientists who did the work. Ideally the endpoint of that press release would be the simple spread of seeds of knowledge and wisdom; but it’s about attention and prestige and, thereby, money. Major universities employ publicists who work full time to make scientific studies sound engaging and amazing. Those publicists email the press releases to people like me, asking me to cover the story because “my readers” will “love it.” And I want to write about health research and help people experience “love” for things. I do!

Across 668 news stories about health science, the Cardiff researchers compared the original academic papers to their news reports. They counted exaggeration and distortion as any instance of implying causation when there was only correlation, implying meaning to humans when the study was only in animals, or giving direct advice about health behavior that was not present in the study. They found evidence of exaggeration in 58 to 86 percent of stories when the press release contained similar exaggeration. When the press release was staid and made no such errors, the rates of exaggeration in the news stories dropped to between 10 and 18 percent…

Sumner and colleagues say they would not shift liability to press officers, but rather to academics. “Most press releases issued by universities are drafted in dialogue between scientists and press officers and are not released without the approval of scientists,” the researchers write, “and thus most of the responsibility for exaggeration must lie with the scientific authors.”

Scientific studies are often complex and probabilistic. It is difficult to model and predict complex natural and social phenomena and scientific studies often give our best estimate or interpretation of the data. But, science tends to steadily accumulate findings and knowledge more than a model where every single study definitively proves things. This can mean that individual studies contribute to the larger whole but often don’t set the agenda or have a radically new finding.

Yet, translating that understanding into something fit for public consumption is difficult. Academics are often criticized for dense and jargon-filled language so pieces for the general public have to be written differently. Academics want their findings to matter and colleges and universities like good publicity as well. Presenting limited or weaker findings doesn’t get as much attention.

All that said, there is an opportunity here to improve the reporting of scientific findings.

Is more Internet use correlated to a decline in religious affiliation?

A new study suggests using the Internet more is correlated with lower levels of religious affiliation:

Downey analyzed data from the General Social Survey, a well-respected annual research survey carried out by the University of Chicago, to make his findings.

Downey says the single biggest cause of religious affiliation is upbringing: those you are raised in religious households are much more likely to remain in their family’s religion as adults…

By far the largest factor, says Downey, is Internet use.

In the 1980s, Internet use was virtually non-existent, but in 2010, 53 per cent of people spent two hours online a week and 25 per cent spent more than seven hours…

Downey says that his research has controlled for ‘most of the obvious candidates, including income, education, socioeconomic status, and rural/urban environments’ to discount a third factor, one that is responsible both for the rise of Internet use and the drop in religiosity.

Since the full story is behind a subscriber wall, two speculations about the methodology of this study:

1. This sounds like a regression and/or ANOVA analysis based on R-squared changes. In other words, when one explanatory factor is in the model, how much more of the variation in the dependent variable (religiosity) is explained? You can then add or subtract different factors singly or in combination to see how that percent of variation explained changes.

2. Looking at religious affiliation is just one way to measure religiosity. Affiliation is based on self-identification (do you consider yourself a Catholic, mainline Protestant, conservative Protestant, etc.) or what religious congregation you regularly attend or interact with. But, levels of religious affiliation have been falling in recent years even as not all measures of religiosity are falling. Research about the rise of the “religious nones” shows a number of these people still are spiritual or perform religious practices.

If there is a strong causal relationship between increased Internet use and less religiosity, why might this be the case? A few ideas:

1. The Internet opens people up to a whole realm of information beyond themselves. Traditionally, people would look to those around them, whether individuals or institutions, within relatively close proximity. The Internet breaks a lot of these social boundaries and allows people to search for information way beyond themselves.

2. The Internet offers social interactions in a way that religion used to. Instead of going to a religious congregation to meet people, the Internet offers the possibilities of finding like-minded people in all sorts of areas from hobbies and interests, people in the same career field, dating websites, and people you want to sell goods to. In other words, some of the social aspects of religion can now be replicated online.

3. The Internet in its medium and content tends to be individualistic. Anyone with an Internet connection can do all sorts of things without relying on others (outside of having a service provider). This simply feeds into individualistic attitudes that already existed in the United States.

It sounds like there is a lot more here for researchers to explore and unpack.

Sociologists argue it is difficult to find causal data for how inequality leads to different outcomes

Two sociologists tackle the question of how exactly inequality is related to a variety of social outcomes and argue it is difficult to find causal, and not correlative, data:

For all the brain power thrown at the problem since then, however, specific evidence about inequality’s effects has been hard to find. Mr. Jencks said he could already picture the book’s reviews, “Professor Doesn’t Know What He Is Talking About.”…

One problem with these analyses is that they are based on correlations between levels of inequality and variables like life expectancy or the odds of poor children climbing the income ladder. But such correlations can’t prove inequality causes other social ills. They can’t disentangle inequality from the myriad things pushing American society this way and that.

Life expectancy in the United States might lag that of other countries because the United States still does not have universal health care. Scandinavia may enjoy higher upward mobility than the United States because governments in Sweden, Denmark and other Scandinavian countries invest a lot in early childhood education and the United States does not.

Lane Kenworthy, a sociologist at the University of Arizona, is all too aware of these limitations. He was to be Mr. Jencks’s co-author on the book about inequality’s consequences. Now he is going it alone, hoping to publish “Should We Worry About Inequality?” next year.

“People that worry about inequality for normative reasons have been very quick to jump on plausible hypothesis and a little bit of evidence to make sweeping conclusions about its consequences,” Professor Kenworthy told me.

It sounds like these sociologists are asking for some more methodological rigor in studying how inequality affects social life. Finding direct relationships between social forces and outcomes can be difficult but I look forward to seeing more work on the subject.

Read more in this follow-up interview with Lane Kenworthy.

Correlation between migration patterns and state freedom in the United States?

A new report suggests there is a correlation between migration to freer, more conservative states:

It found that the freest states tended to be conservative “red” states, while the least free were liberal “blue” states.

The freest state overall, the researchers concluded, was North Dakota, followed by South Dakota, Tennessee, New Hampshire and Oklahoma. The least free state by far was New York, followed by California, New Jersey, Hawaii and Rhode Island.

The study also compared its measures of economic and personal freedom to population shifts and income growth, and found that freer states tend to do better on both scores than those less free.

For example, it found a strong correlation between a state’s freedom ranking and migration, which means that Americans are gravitating toward states that have less-intrusive governments.

This might be part of an explanation for migration. But the website itself makes it difficult to find the correlation – go to the FAQs and then you can click through to a 234 page PDF file. And then I can’t find exact correlations. Here is what the regression results suggest (page 105 of the PDF):

The estimates from equation 2 imply that a half-unit change in fiscal policy score, for instance from Michigan to New Hampshire (2011 values), is associated with an increase in net interstate migration of about 2 percent of 2000 population; a half-unit change in regulatory policy score, for instance from New Jersey to Virginia (2011 values), is associated with an increase in net interstate migration of about 4.2 percent of 2000 population; and a quarter-unit change in personal freedom score, for instance from Alabama to Maine (2011 values), is associated with an increase in net interstate migration of about 2.5 percent of 2000 population. If we can interpret these relationships as causal, then to policy makers interested in attracting new
residents and businesses we would recommend measures to increase freedom and reduce cost of living.

I would want to see some other variables tested to rule out other competing factors.

Correlations that get at why big cities lean toward Democrats

Richard Florida discusses several reasons, based on correlations, why big cities now so clearly lean toward the Democratic party:

Density played a key role in the metro vote. (To capture it we use a measure we of population-based density, which accounts for the concentration of people in metro). The average Obama metro was more than twice as dense as the average Romney metro, 412 versus 193 people per square mile. With a correlation of .50, density was an even bigger factor than population (where the correlation is .34). The reverse pattern holds for the share of Romney votes; the negative correlation for density (-.51) was significantly higher than that for population (-.33)…

The chart below plots the relationship between a metro’s share of college grads and its share of Obama votes. The line slopes steeply upward showing how the share of Obama votes increase alongside metro density. The share of college grads in a metro is positively correlated with the share of Obama votes (.42) and negatively with the share of Romney votes (-.44)…

The chart above shows the relationship between the share of the creative class and the share of Obama votes across metro areas. The line slopes steeply upward, indicating a considerable positive relationship. The share of creative class workers is positively correlated with the share of Obama votes (.40) and negatively with the share of Romney votes (-.41)…

Republicans may still be the party of the rich, but most of the country’s more-affluent metros lined up squarely in the Obama camp. The correlation between the average wages and salaries of metros and the share of Obama votes is positive (.50) and it is negative for Romney votes (-.51). This makes sense too, as larger metros have greater concentrations of knowledge-based talent and industries and are wealthier to begin with. (The associations we find are even more substantial for metros with more than one million people, with the correlations increasing to .71 for Obama and -.72 for Romney.) This follows the “Red State, Blue State, Rich State, Poor State” pattern identified by Andrew Gelman of Columbia University, who infamously found that while rich voters continue to trend Republican, rich states trend Democratic.

Florida argues this is evidence of class-based differences in American life, specifically, differences between the creative class and those in knowledge industries compared to the rest of the United States.

However, this raises a few questions:

1. The analysis here seems to be done across metropolitan areas while some of these voting patterns break down as we compare cities versus suburbs. For example, there are those who suggest it is really about cities and inner-ring suburbs that vote Democratic while more further flung suburbs and exurbs vote Republican. See earlier posts about the analysis of Joel Kotkin – here and here.

2. Making claims with correlations with tricky. Florida acknowledges this before he rolls out the analysis: “As usual, I point out that correlation points to associations between variables only, not causation.” But, then why stop the analysis at correlations here? Looking at the relationships just between two variables at a time ignores the complex relationships between factors like race, class, location, jobs, and more. Why not quickly run some regressions?

3. If this analysis is correct (and we need more in-depth analysis to check), why are Republicans so bad at appealing to the creative class?