The methodology of quantifying the cost of sprawl

A new analysis says sprawl costs over $107 billion each year – and here is how they arrived at that figure:

To get to those rather staggering numbers, Hertz developed a unique methodology: He took the average commute length, in miles, for America’s 50 largest metros (as determined by the Brookings Institution), and looked at how much shorter those commutes would be if each metro were more compact. He did this by setting different commute benchmarks for clusters of comparably populated metros: six miles for areas with populations of 2.5 million or below, and 7.5 miles for those with more than 2.5 million people. These benchmarks were just below the commute length of the metro with the shortest average commute length in each category, but still 0.5 miles within the real average of the overall category.

He multiplied the difference between the benchmark and each metro’s average commute length by an estimated cost-per-mile for a mid-sized sedan, then doubled that number to represent a daily roundtrip “sprawl tax” per worker, and then multiplied that by the number of workers within a metro region to get the area’s daily “sprawl tax.” After multiplying that by the annual number of workdays, and adding up each metro, he had a rough estimate of how much sprawl costs American commuters every year.

Then Hertz calculated the time lost by all this excessive commuting, “applying average travel speed for each metropolitan area to its benchmark commute distance, as opposed to its actual commute distance,” he explains in a blog post…

Hertz’s methodology may not be perfect. It might have served his analysis to have grouped these metros into narrower buckets, or by average commute distance rather than population. While it’s true that large cities tend to have longer commutes, there are exceptions. New Orleans and Louisville are non-dense, fairly sprawling cities, but their highways are built up enough that commute distances are fairly short. To really accurately assess the “sprawl tax” in cities like those, you’d have to include the other costs of spread-out development mentioned previously—the health impacts, the pollution, the car crashes, and so on. Hertz only addresses commute lengths and time.

In other words, a number of important conceptual decisions had to be made in order to arrive at this final figure. What might be more important in this situation is to know how different the final figure would be if certain calculations along the way were changed. Is it a relatively small shift or does this new methodology lead to figures much different than other studies? If they are really different, that doesn’t necessarily mean they are wrong but it might suggest more scrutiny for the methodology.

Another thought: it is difficult to put the $107 trillion into context. It is hard to understand really big numbers. Also, how does it compare to other activities? How much do Americans lose by watching TV? Or by using their smartphones? Or by eating meals? The number sounds impressive and is likely geared toward reducing sprawl but the figure doesn’t interpret itself.

11 recommendations from social scientists to journalists reporting scientific findings

Twenty social scientists were asked to give advice to journalists covering scientific research; here are a few of the recommendations.

1) Journalists often want clear answers to life and social problems. Individual studies rarely deliver that…

3) Journalists are obsessed with what’s new. But it’s better to focus on what’s old…

6) There’s a difference between real-world significance and statistical significance

10) Always direct readers back to the original research

And yes, not confusing correlation and causation is on the list. This would indeed be a good list for journalists and the media to keep in mind; the typical social science study produces pretty modest findings. Occasionally, there are studies that truly challenge existing theories and findings or these shifts might happen across a short amount of time or within a few studies.

At the same time, this would be a good list for the general public as well or starting students in a social science statistics or research methods course. For example, students sometimes equate using statistics or numbers with “proof” but that is not really what social science studies provide. Instead, studies tend to provide probabilities – people are more or less likely to have a future behavior or attitude (and this is covered specifically in #5 in the list). Or, we may have to explain in class how studies add up over time and lead to a consensus within a discipline rather than having a single study provide all the evidence (#s 1, 2, 3 on the list).

“Pollsters defend craft amid string of high-profile misses”

Researchers and polling organizations continue to defend their efforts:

Pollsters widely acknowledge the challenges and limitations taxing their craft. The universality of cellphones, the prevalence of the Internet and a growing reluctance among voters to respond to questions are “huge issues” confronting the field, said Ashley Koning, assistant director at Rutgers University’s Eagleton Center for Public Interest Polling…

“Not every poll,” Koning added, “is a poll worth reading.”

Scott Keeter, director of survey research at the Pew Research Center, agreed. Placing too much trust in early surveys, when few voters are paying close attention and the candidate pools are their largest, “is asking more of a poll than what it can really do.”…

Kathryn Bowman, a public opinion specialist at the American Enterprise Institute, also downplayed the importance of early primary polls, saying they have “very little predictive value at this stage of the campaign.” Still, she said, the blame is widespread, lamenting the rise of pollsters who prioritize close races to gain coverage, journalists too eager to cover those results and news consumers who flock to those types of stories.

Given the reliance on data in today’s world, particularly in political campaigns, polls are unlikely to go away. But, there will be likely be changes in the future that might include:

  1. More consumers of polls, the media and potential voters, learn what exactly polls are saying and what they are not. Since the media seems to love polls and horse races, I’m not sure much will change in that realm. But, we need great numeracy among Americans to sort through all of these numbers.
  2. Continued efforts to improve methodology when it is harder to reach people and obtain representative samples and predict who will be voting.
  3. A consolidation of efforts by researchers and poling organizations as (a) some are knocked out by a string of bad results or high-profile wrong predictions and (b) groups try to pool their resources (money, knowledge, data) to improve their accuracy. Or, perhaps (c) polling will just become a partisan effort as more objective observers realize their efforts won’t be used correctly (see #1 above).

Can religion not be fully studied with surveys or do we not use survey results well?

In a new book (which I have not read), sociologist Robert Wuthnow critiques the use of survey data to explain American religion:

Bad stats are easy targets, though. Setting these aside, it’s much more difficult to wage a sustained critique of polling. Enter Robert Wuthnow, a Princeton professor whose new book, Inventing American Religion, takes on the entire industry with the kind of telegraphed crankiness only academics can achieve. He argues that even gold-standard contemporary polling relies on flawed methodologies and biased questions. Polls about religion claim to show what Americans believe as a society, but actually, Wuthnow says, they say very little…

Even polling that wasn’t bought by evangelical Christians tended to focus on white, evangelical Protestants, Wuthnow writes. This trend continues today, especially in poll questions that treat the public practice of religion as separate from private belief. As the University of North Carolina professor Molly Worthen wrote in a 2012 column for The New York Times, “The very idea that it is possible to cordon off personal religious beliefs from a secular town square depends on Protestant assumptions about what counts as ‘religion,’ even if we now mask these sectarian foundations with labels like ‘Judeo-Christian.’”…

These standards are largely what Wuthnow’s book is concerned with: specifically, declining rates of responses to almost all polls; the short amount of time pollsters spend administering questionnaires; the racial and denominational biases embedded in the way most religion polls are framed; and the inundation of polls and polling information in public life. To him, there’s a lot more depth to be drawn from qualitative interviews than quantitative studies. “Talking to people at length in their own words, we learn that [religion] is quite personal and quite variable and rooted in the narratives of personal experience,” he said in an interview…

In interviews, people rarely frame their own religious experiences in terms of statistics and how they compare to trends around the country, Wuthnow said. They speak “more about the demarcations in their own personal biographies. It was something they were raised with, or something that affected who they married, or something that’s affecting how they’re raising their children.”

I suspect such critiques could be leveled at much of survey research: the questions can be simplistic, the askers of the questions can have a variety of motives and skills in developing useful survey questions, and the data gets bandied about in the media and public. Can surveys alone adequately address race, cultural values, politics views and behaviors, and more? That said, I’m sure there are specific issues with surveys regarding religion that should be addressed.

I wonder, though , if another important issue here is whether the public and the media know what to do with survey results. This book review suggests people take survey findings as gospel. They don’t know about the nuances of surveys or how to look at multiple survey questions or surveys that get at similar topics. Media reports on this data are often simplistic and lead with a “shocking” piece of information or some important trend (even if the data suggests continuity). While more social science projects on religion could benefit from mixed methods or by incorporating data from the other side (whether quantitative or qualitative), the public knows even less about these options or how to compare data. In other words, surveys always have issues but people are generally innumerate in knowing what to do with the findings.

The biggest time-use diary archive in the world

Numerous scholars are making use of the 850,000+ person days recorded in diaries and held in a UK archive:

Today, these files are part of the biggest collection of time-use diaries in the world, kept by the Centre for Time Use Research at the University of Oxford, UK. The centre’s holdings have been gathered from nearly 30 countries, span more than 50 years and cover some 850,000 person-days in total. They offer the most detailed portrait ever created of when people work, sleep, play and socialize — and of how those patterns have changed over time. “It certainly is unique,” says Ignace Glorieux, a sociologist at the Dutch-speaking Free University of Brussels. “It started quite modest, and now it’s a huge archive.”

The collection is helping to solve a slew of scientific and societal puzzles — not least, a paradox about modern life. There is a widespread perception in Western countries that life today is much busier than it once was, thanks to the unending demands of work, family, chores, smartphones and e-mails. But the diaries tell a different story: “We do not get indicators at all that people are more frantic,” says John Robinson, a sociologist who works with time-use diaries at the University of Maryland, College Park. In fact, when paid and unpaid work are totted up, the average number of hours worked every week has not changed much since the 1980s in most countries of the developed world…

But certain groups have experienced a different trend. According to analyses by Gershuny, Sullivan and other time-use researchers, two demographic groups are, in fact, working harder. One consists of employed, single parents, who put in exceptionally long hours compared to the average; the other comprises well-educated professionals, particularly those who also have small children. People in this latter group find themselves pushed to work hard and under societal pressure to spend quality time with their kids. “The combination of those pressures has meant that there is this group for which time pressure is particularly pertinent,” Sullivan says.

Some researchers are also testing new ways to record people’s activities as they can compare the results to the diaries:

In her preliminary analyses, Harms has found that gadget diaries and paper diaries show the same sequence of events, but that the gadgets reveal details that paper diaries missed. Most researchers in the field agree that the future lies in collecting data through phones and other devices. “Maybe this will bring a new boost to time-use research,” Glorieux says. He anticipates a situation in which reams of diary data — such as location, heart rate, calories burned and even ambient noise — are collected through phones and linked-up gadgets.

Much social science research is focused on particular events or aspects of people’s lives – not just a cross-section of time but also specific information measured in variables that we think might be related to other variables or that we think are worth measuring. In contrast, time-use diaries and other methods can help get at the mundane, everyday activity and interactions that make up a majority of our lives. Much of adult life is spent in necessary activities: making and eating food, resting and sleeping, cleaning, more passive leisure activities, caring for children. We also spend a decent amount of time alone or in our own head. These activities are occasionally punctuated by big events – something exciting happens at work or home, lively social interaction occurs, an important thought is had, etc. – to which we tend to pay more attention both in our own minds and in our data collection. Our methods should probably more closely match this regular activity and time-use diaries represent one way of doing this.

Competing population projections for Chicago

I highlighted one recent prediction that Chicago would soon trail Houston in population. Yet, another projection has Chicago gaining people and holding off Houston for longer. Which is right?

Data released by the Illinois Department of Health in February show that the population for Chicago, about 2.7 million in 2010, could decrease by 3 percent to 2.5 million by 2025. Meanwhile, Houston’s population could reach 2.54 million to 2.7 million in 2025, according to the Reuters report. But a recent population estimate by the Census Bureau shows an increase in population, rather than a decrease.

Census estimates released in June show that the population of Chicago increased by 1 percent from 2010 to 2014. So why is one projection showing a decrease, but another an increase?

Both data sets are based on estimates and assumptions, says Rob Paral, a Chicago-based demographer. Unlike the 2000 or 2010 census, where all residents answer a questionnaire, any interim projections or estimates must use sampling or a formula based on past population statistics to calculate population…

“Trend data do not support any increase in the projections for Chicago in the next 10 years,” said Bill Dart, the deputy director of policy, planning and statistics at the health department. Dart explained that the estimates from the census use a different formula than the health department. And factors such as births, deaths, migration, economic boons or natural disasters can disrupt projections.

Two groups dealing with population data that come to opposite conclusions. Two ways we might approach this:

  1. The differences are due to slightly different data, whether in the variables used or the projection models. We could have a debate about which model or variables are better for predicting population. Have these same kind of variables and models proven themselves in other cities? (Alternately, are there factors that both models leave out?)
  2. Perhaps the two predictions aren’t that different: one is suggesting a slight decline and one predicts a slight increase. Could both predictions be within the margin of error? We might be really worried if one saw a huge drop-off coming and the other disagreed but both projections here are not too different from no change at all. Sure, the media might be able to say the predictions disagree but statistically there is not much difference.

The answer will come in time. Still, projections like these still carry weight as they provide grist for the media, things for politicians to grab onto, and may just influence the actions of some (is Chicago or Houston a city on the rise?).

The perils of analyzing big real estate data

Two leaders of Zillow recently wrote Zillow Talk: The New Rules of Real Estate which is a sort of Freakanomics look at all the real estate data they have. While it is an interesting book, it also illustrates the difficulties of analyzing big data:

1. The key to the book is all the data Zillow has harnessed to track real estate prices and make predictions on current and future prices. They don’t say much about their models. This could be for two good reasons: this is aimed at a mass market and the models are their trade secrets. Yet, I wanted to hear more about all the fascinating data – at least in an appendix?

2. Problems of aggregation: the data is analyzed usually at a metro area or national level. There are hints at smaller markets – a chapter on NYC for example and another looking at some unusual markets like Las Vegas – but there are not different chapters on cheaper/starter homes or luxury homes. An unanswered questino: is real estate within or across markets more similar? Put another way, are the features of the Chicago market so unique and patterned or are cheaper homes in the Chicago region more like similar homes in Atlanta or Los Angeles compared to more expensive homes across markets?

3. Most provocative argument: in Chapter 24, the authors suggest that pushing homeownership for lower-income Americans is a bad idea as it can often trap them in properties that don’t appreciate. This was a big problem in the 2000s: Presidents Clinton and Bush pushed homeownership but after housing values dropped in the late 2000s, poorer neighborhoods were hit hard, leaving many homeowners to default or seriously underwater. Unfortunately, unless demand picks up in these neighborhoods (and gentrification is pretty rare), these homes are not good investments.

4. The individual chapters often discuss small effects that may be significant but don’t have large substantive effects. For example, there is a section on male vs. female real estate agents. The effects for each gender are small: at most, a few percentage points difference in selling price as well as slight variations in speed of sale. (Women are better in both categories: higher prices, faster sales.)

5. The authors are pretty good at repeatedly pointing out that correlation does not mean causation. Yet, they don’t catch all of these moments and at other times present patterns in such a way that distort the axes. For example, here is a chart from page 202:

ZillowTalkp202

These two things may be correlated (as one goes up so does the other and vice versa) but why fix the axes so you are comparing half percentages to five percentage increments?

6. Continuing #4, I supposed a buyer and seller would want to use all the tricks they can but the tips here mean that those in the real estate market are supposed to string along all of these small effects to maximize what they get. On the final page, they write: “These are small actions that add up to a big difference.” Maybe. With margins of error on the effects, some buyers and sellers aren’t going to get the effects outlined here: some will benefit more but some will benefit less.

7. The moral of the whole story? Use data to your advantage even as it is not a guarantee:

In the new realm of real estate, everyone faces a rather stark choice. The operative question now is: Do you wield the power of data to your advantage? Or do you ignore the data, to your peril?

The same is true of the housing market writ large. Certainly, many macro-level dynamics are out of any one person’s control. And yet, we’re better equipped than ever before to choose wisely in the present – to make the kinds of measured judgments that can prevent another coast-to-coast bubble and calamitous burst. (p.252)

In the end, this book is aimed at the mass market where a buyer or seller could hope to string together a number of these small advantages. Yet, there are no guarantees and the effects are often small. Having more data may be good for markets and may make participants feel more knowledgeable (or perhaps more overwhelmed) but not everyone can take advantage of this information.

“So what are the rules of ethnography, and who enforces them?”

A journalist looking into the Goffman affair discusses the ethics of ethnography:

To find out, I called several sociologists and anthropologists who had either done ethnographic research of their own or had thought about the methodology from an outside perspective. Ethnography, they explained, is a way of doing research on groups of people that typically involves an extended immersion in their world. If you’re an ethnographer, they said, standard operating procedure requires you to take whatever steps you need to in order to conceal the identities of everyone in your sample population. Unless you formally agree to fulfill this obligation, I was told, your research proposal will likely be blocked by the institutional review board at your university…

The frustration is not merely a matter of academics resenting oversight out of principle. Many researchers think the uncompromising demand for total privacy has a detrimental effect on the quality of scholarship that comes out of the social sciences—in part because anonymization makes it impossible to fact-check the work…

According to Goffman, her book is no less true than Leovy’s or LeBlanc’s. That’s because, as she sees it, what sociologists set out to capture in their research isn’t truths about specific individuals but general truths that tell us how the world works. In her view, On the Run is a true account because the general picture it paints of what it’s like to live in a poor, overpoliced community in America is accurate.

“Sociology is trying to document and make sense of the major changes afoot in society—that’s long been the goal,” Goffman told me. Her job, she said, as a sociologist who is interested in the conditions of life in poor black urban America, is to identify “things that recur”—to observe systemic realities that are replicated in similar neighborhoods all over the country. “If something only happens once, [sociologists are] less interested in it than if it repeats,” she wrote to me in an email. “Or we’re interested in that one time thing because of what it reveals about what usually happens.” This philosophy goes back to the so-called Chicago school of sociology, Goffman added, which represented an attempt by observers of human behavior to make their work into a science “by finding general patterns in social life, principles that hold across many cases or across time.”…

Goffman herself is the first to admit that she wasn’t treating her “study subjects” as a mere sample population—she was getting to know them as human beings and rendering the conditions of their lives from up close. Her book makes for great reading precisely because it is concerned with specifics—it is vivid, tense, and evocative. At times, it reads less like an academic study of an urban environment and more like a memoir, a personal account of six years living under extraordinary circumstances. Memoirists often take certain liberties in reconstructing their lives, relying on memory more than field notes and privileging compelling narrative over strict adherence to the facts. Indeed, in a memoir I’m publishing next month, there are several moments I chose to present out of order in order to achieve a less convoluted timeline, a fact I flag for the reader in a disclaimer at the front of the book.

Not surprisingly, there is disagreement within the discipline of sociology as well as across disciplines about how ethnography could and should work. It is a research method that requires so much time and personal effort that it can be easy to tie to a particular researcher and their laudable steps or mistakes. This might miss the forest for the trees; I’ve thought for a while that we need more discussion across ethnographies rather than seeing them as either the singular work on the subject. In other words, does Goffman’s data line up with what others have found in studying race, poor neighborhoods, and the criminal justice system? And if there are not comparisons to make with Goffman’s work, why aren’t more researchers wrestling with the same topic?

Additionally, this particular discussion highlights longstanding tensions in sociology: qualitative vs. quantitative data (with one often assumed to be more “fact”); “facts” versus “interpretation”; writing academic texts versus books for more general audiences; emphasizing individual stories (which often appeals to the public) versus the big picture; dealing with outside regulations such as IRBs that may or may not be accustomed to dealing with ethnographic methods in sociology; and how to best do research to help disadvantaged communities. Some might see these tensions as more evidence that sociology (and other social sciences) simply can’t tell us much of anything. I would suggest the opposite: the realities of the social world are so complex that these tensions are necessary in gathering and interpreting comprehensive data.

Lancet editor suggests “much of the scientific literature, perhaps half, may be simply untrue”

The editor of The Lancet quickly summarizes several major issues regarding scientific studies:

The case against science is straightforward: much of the scientific literature, perhaps half, may simply be untrue. Afflicted by studies with small sample sizes, tiny effects, invalid exploratory analyses, and flagrant conflicts of interest, together with an obsession for pursuing fashionable trends of dubious importance, science has taken a turn towards darkness. As one participant put it, “poor methods get results”. The Academy of Medical Sciences, Medical Research Council, and Biotechnology and Biological Sciences Research Council have now put their reputational weight behind an investigation into these questionable research practices. The apparent endemicity of bad research behaviour is alarming. In their quest for telling a compelling story, scientists too often sculpt data to fit their preferred theory of the world. Or they retrofit hypotheses to fit their data. Journal editors deserve their fair share of criticism too. We aid and abet the worst behaviours. Our acquiescence to the impact factor fuels an unhealthy competition to win a place in a select few journals. Our love of “significance” pollutes the literature
with many a statistical fairy-tale. We reject important confirmations. Journals are not the only miscreants. Universities are in a perpetual struggle for money and talent, endpoints that foster reductive metrics, such as high-impact publication. National assessment procedures, such as the Research Excellence Framework, incentivise bad practices. And individual scientists, including their most senior leaders, do little to alter a research culture that occasionally veers close to misconduct.

He goes to suggest some solutions such as different incentives, data review before publication, and a higher bar for statistical significance. Are there also some basic questions here about methodology such as whether randomized controlled experiments are the best way to go, particularly if the N is small? Dr. John Ioaniddis has argued for more rigorous methods in medical research, suggesting trials need to compare a new treatment to an existing treatment rather than a new option to a placebo. Perhaps we also need more metastudies that look across various studies to summarize findings rather than relying on a single study or a small group of studies to validate a finding.

At the least, this is a public relations issue for the natural and social sciences. The public tends to trust science but an increasing number of studies that are later retracted amongst breathless pronouncements of new findings will not go over well. Beyond the optics, this gets at a basic question for scientists: are we/they truly interested in finding reality? What is this scientific work intended to do anyway?

Nate Silver: “The World May Have A Polling Problem”

In looking at the disparities between polls and recent election results in the United States and UK, Nate Silver suggests the polling industry may be in some trouble:

Consider what are probably the four highest-profile elections of the past year, at least from the standpoint of the U.S. and U.K. media:

  • The final polls showed a close result in the Scottish independence referendum, with the “no” side projected to win by just 2 to 3 percentage points. In fact, “no” won by almost 11 percentage points.
  • Although polls correctly implied that Republicans were favored to win the Senate in the 2014 U.S. midterms, they nevertheless significantly underestimated the GOP’s performance. Republicans’ margins over Democrats were about 4 points better than the polls in the average Senate race.
  • Pre-election polls badly underestimated Likud’s performance in the Israeli legislative elections earlier this year, projecting the party to about 22 seats in the Knesset when it in fact won 30. (Exit polls on election night weren’t very good either.)

At least the polls got the 2012 U.S. presidential election right? Well, sort of. They correctly predicted President Obama to be re-elected. But Obama beat the final polling averages by about 3 points nationwide. Had the error run in the other direction, Mitt Romney would have won the popular vote and perhaps the Electoral College.

Perhaps it’s just been a run of bad luck. But there are lots of reasons to worry about the state of the polling industry. Voters are becoming harder to contact, especially on landline telephones. Online polls have become commonplace, but some eschew probability sampling, historically the bedrock of polling methodology. And in the U.S., some pollsters have been caught withholding results when they differ from other surveys, “herding” toward a false consensus about a race instead of behaving independently. There may be more difficult times ahead for the polling industry.

It sounds like there are multiple areas for improvement:

1. Methodology. How can polls reach the average citizen two decades into the 21st century? How can they collect representative samples?

2. Behavior across the pollsters, the media, and political operatives. How are these polls reported? Is the media more interested in political horse races than accurate poll results? Who can be viewed as an objective polling organization? Who can be viewed as an objective source for reporting and interpreting polling figures?

3. A decision for academics as well as pollsters: how accurate should polls be (what are the upper bounds for margins of error)? Should there be penalties for work that doesn’t accurately reflect public opinion?