The county with the worst roads for traveling in the Chicago region

The Chicago Metropolitan Agency for Planning has a new data tool online and it provides insights into the commuting experiences of Chicago area residents:

CMAP planners say it’s time to “get people excited about data.” The hope is CMAP’s constituents — Cook, DuPage, Kane, Kendall, Lake, McHenry, and Will counties — will use the facts to understand why certain projects deserve prioritization and funding. To access the data, go to http://www.cmap.illinois.gov/mobility/explore…

To that end, a section on ride quality includes detailed maps measuring pavement conditions on both expressways and major roads. A snapshot of counties’ ride quality on major roads puts Cook County with a 47 percent rating compared to 72 percent in DuPage, 80 percent in Kane and 83 percent in Lake and 90 percent in McHenry.

Other data available includes stats on bridges in need of repair, pavement quality, the number of passengers boarding at Metra and CTA stops and the worst railway crossings for delays in the region — FYI, it’s on Chicago’s South Side at Morgan Street and Pershing Road with 3,194 vehicles delayed a day.

Taken cumulatively, the website sends a message that the region’s infrastructure needs more capital to avoid gridlock, stagnant transit and deteriorating roads. The warning is timely, with a new governor in Springfield and a push for state and federal multiyear capital programs.

Two things strike me as interesting:

1. I always like the idea of putting more data into people’s hands. Commuting is a common experience and one that people would probably want to see improved. However, without data that moves behind individual and/or anecdotal evidence, it is hard to have conversations about the bigger picture in the region.

2. Some people may like data but it is another thing to translate that data access into collective action. Assuming that some people go to this site, will they then take an interest in infrastructure projects? Will they contact political officials? Will they vote differently? How exactly CMAP goes about putting this data into action is worth paying attention to.

The difficulties in finding out the most popular street name in the United States

FiveThirtyEight tries to find out the most common street name in the US and this leads to comparing Census information from 1993 with a Reddit user’s work:

The chart on Reddit that sparked your question looks very different from the 1993 list of most common street names from the Census Bureau.

Why, for example are there 3,238 extra Main streets in that chart compared with the census records in 1993? To find out, I got in touch with “darinhq,” whose name is Darin Hawley when he’s not producing charts on Reddit. After speaking to him, I think there are three explanations for the difference between his chart and the official data.

First, some new streets may have been built over the past 20 years (Hawley used 2013 census data to make his chart). Second, some streets may have changed their names: If a little town grows, it might change the name of its principal street from Tumbleweed Lane to Main Street.

Third, I don’t know how the Census Bureau produced its 1993 list (I asked, and a spokesperson told me the researcher who made it can’t recall his methodology), so Hawley might have simply used a different methodology to produce his chart. Because I wasn’t able to find any data on the frequency that American streets are renamed or the rate at which new streets are being built, I’m going to stake my money on this third explanation. Hawley told me that he counted “Main St N” and “N Main St” as two separate streets in his data. If the Census Bureau counted them as just one street, that could account for the difference.

That’s not the only executive decision Hawley made when he was summarizing this data. He set a minimum of how far away one Elm Street in Maine had to be from another Elm Street in Maine to qualify as two separate streets. That’s a problem because streets can break and resume in unexpected ways.

In other words, getting an answer requires making some judgment calls with the available data. While this is the sort of question that exemplifies the intriguing things we can all learn from the Internet, it is also a question that likely isn’t important enough to spend a lot of time with it. As an urban sociologist, this is an interesting question but what would I learn from the frequencies of street names? What hypothesis could I test? It might roughly tell us the names that Americans give to roads. What we value may just be reflected in these road names. For example, the Census data suggests that numbered streets and references to nature dominate the top 20. Does this mean we like order (a pragmatic approach) and idyllic yet vague nature terms (park, view, lake, tree names) over other things? Yet, the list has limitations as these communities and roads were built at different times, roads can be renamed, and we do have to make judgment calls about what specifies separate streets.

Two other thoughts:

1. The Census researcher who did this back in the early 1990s can’t remember the methodology. Why wasn’t it part of the report?

2. Is this something that would be best left up to marketers (who might find some advertising value in this) or GIS firms (who have access to comprehensive map data)?

University press releases exaggerate scientific findings

A new study suggests exaggerations about scientific findings – for example, suggesting causation when a study only found correlation – start at the level of university press releases.

Yesterday Sumner and colleagues published some important research in the journal BMJ that found that a majority of exaggeration in health stories was traced not to the news outlet, but to the press release—the statement issued by the university’s publicity department…

The goal of a press release around a scientific study is to draw attention from the media, and that attention is supposed to be good for the university, and for the scientists who did the work. Ideally the endpoint of that press release would be the simple spread of seeds of knowledge and wisdom; but it’s about attention and prestige and, thereby, money. Major universities employ publicists who work full time to make scientific studies sound engaging and amazing. Those publicists email the press releases to people like me, asking me to cover the story because “my readers” will “love it.” And I want to write about health research and help people experience “love” for things. I do!

Across 668 news stories about health science, the Cardiff researchers compared the original academic papers to their news reports. They counted exaggeration and distortion as any instance of implying causation when there was only correlation, implying meaning to humans when the study was only in animals, or giving direct advice about health behavior that was not present in the study. They found evidence of exaggeration in 58 to 86 percent of stories when the press release contained similar exaggeration. When the press release was staid and made no such errors, the rates of exaggeration in the news stories dropped to between 10 and 18 percent…

Sumner and colleagues say they would not shift liability to press officers, but rather to academics. “Most press releases issued by universities are drafted in dialogue between scientists and press officers and are not released without the approval of scientists,” the researchers write, “and thus most of the responsibility for exaggeration must lie with the scientific authors.”

Scientific studies are often complex and probabilistic. It is difficult to model and predict complex natural and social phenomena and scientific studies often give our best estimate or interpretation of the data. But, science tends to steadily accumulate findings and knowledge more than a model where every single study definitively proves things. This can mean that individual studies contribute to the larger whole but often don’t set the agenda or have a radically new finding.

Yet, translating that understanding into something fit for public consumption is difficult. Academics are often criticized for dense and jargon-filled language so pieces for the general public have to be written differently. Academics want their findings to matter and colleges and universities like good publicity as well. Presenting limited or weaker findings doesn’t get as much attention.

All that said, there is an opportunity here to improve the reporting of scientific findings.

Adding a chart to scientific findings makes it more persuasive

A new research study suggests charts of data are more persuasive compared to just text:

Then for a randomly selected subsample, the researchers supplemented the description of the drug trial with a simple chart. But here’s the kicker: That chart contained no new information; it simply repeated the information in the original vignette, with a tall bar illustrating that 87 percent of the control group had the illness, and a shorter bar showing that that number fell to 47 percent for those who took the drug.

But taking the same information and also showing it as a chart made it enormously more persuasive, raising the proportion who believed in the efficacy of the drug to 97 percent from 68 percent. If the researchers are correct, the following chart should persuade you of their finding.

What makes simple charts so persuasive? It isn’t because they make the information more memorable — 30 minutes after reading about the drug trials, those who saw the charts were not much more likely to recall the results than those who had just read the description. Rather, the researchers conjecture, charts offer the veneer of science. And indeed, the tendency to find the charts more persuasive was strongest among those who agreed with the statement “I believe in science.”

Charts = science? If veneer of science is the answer, why does the chart support science? Scientists are the ones who use charts? Or they are the ones who are trusted more with charts?

I wonder if there are other explanations:

1. Seeing a clear difference in bars (87% vs. 47%) makes a stronger impression than simply reading the difference. A 40% difference is abstract but is more striking in an image.

2. More people accept the power of visual data today compared to written text. Think of all those Internet infographics with interesting information.

“A Behind-the-Scenes Look at How Infographics Are Made”

A new book examines how designers make infographics:

A new book from graphic guru and School of Visual Arts professor Steven Heller and designer Rick Landers looks at that the process of more than 200 designers, from first sketch to final product. The Infographic Designers Sketchbook is almost exactly what it sounds like. The 350-page tome is essentially a deep dive into the minds of data designers. Heller and Landers have chosen more than 50 designers and asked them to fork over their earliest sketches to give us insights into how they turn a complex set of data into coherent, visually stunning data visualizations. “You see a lot more unbridled, unfettered work when you’re looking at a sketchbook,” says Heller. “You might be looking at a lot of junk, but even that junk tells you something about the artist who is doing it.”

Heller says there are a few through-lines to all good infographics, the first being clarity. The purpose of a data visualization has always been to communicate complex information in a readily digestible way. “You can’t throw curves,” he says. “If you’re going to do something that is complex, like the breakdown of an atomic particle, for example, you have to make it clear.” Clarity is key even in seemingly simple infographics, like Caroline + Young’s Mem:o, an app that visualizes personal data for things like sleep and fitness. The data viz tool uses simple shapes to communicate the various sets of data. This is no coincidence says Heller, adding that our eyes tend to respond to simple geometric forms. “If you start using parallelograms or shapes like that, it may get a little difficult,” he says. “But circle squares and rectangles, those are all forms we adjust our eyes to very quickly.”…

It’s fascinating to go behind the scenes of a designer’s work process, in the way it’s fascinating to flip through another person’s journal or leaf through the papers on their desk. If nothing else, the book is a testament to the sketching process. It shows how designers, and even non-designers, can use a pen and paper to sort through some hairy, complex ideas.

The post has some interesting examples you can look at. This hints at the larger process of interpreting data. If someone just handed you a spreadsheet of data or a few tables with data, it is not an automatic process between that and coming up with the “right” interpretation, whether that be in a written or graphical format. It takes time and skill to present the data in an engaging and informative way.

US unemployment figures distorted by lack of response, repeated takings of the survey

Two new studies suggest unemployment figures are pushed downward by the data collection process:

The first report, published by the National Bureau of Economic Research, found that the unemployment number released by the government suffers from a problem faced by other pollsters: Lack of response. This problem dates back to a 1994 redesign of the survey when it went from paper-based to computer-based, although neither the researchers nor anyone else has been able to offer a reason for why the redesign has affected the numbers.

What the researchers found was that, for whatever reason, unemployed workers, who are surveyed multiple times are most likely to respond to the survey when they are first given it and ignore the survey later on.

The report notes, “It is possible that unemployed respondents who have already been interviewed are more likely to change their responses to the labor force question, for example, if they want to minimize the length of the interview (now that they know the interview questions) or because they don’t want to admit that they are still unemployed.”

This ends up inaccurately weighting the later responses and skewing the unemployment rate downward. It also seems to have increased the number of people who once would have been designated as officially unemployed but today are labeled as out of the labor force, which means they are neither working nor looking for work.

And the second study suggests some of this data could be collected via Twitter by looking for key phrases.

This generally highlights the issue of survey fatigue where respondents are less likely to respond and completely fill out a survey. This hampers important data collection efforts across a wide range of fields. Given the enormity of the unemployment figures for American politics and economic life, this is a data problem worth solving.

A side thought: instead of searching Twitter for key words, why not deliver survey instruments like this through Twitter or smartphones? The surveys would have to be relatively short but they could have the advantage of seeming less time-consuming and could get better data.

Summarizing a year of your life in an infographic report

One designer has put together another yearly report on his own life that is a series of infographics:

For nearly a decade, designer Nicholas Felton has tracked his interests, locations, and the myriad beginnings and ends that make up a life in a series of sumptuously designed “annual reports.” The upcoming edition, looking back at 2013, uses 94,824 data points: 44,041 texts, 31,769 emails, 12,464 interpersonal conversations, 4,511 Facebook status updates, 1,719 articles of snail mail, and assorted notes to tell the tale of a year that started with his departure from Facebook and ended with the release of his app, called Reporter…

New types of data forced Felton to experiment with novel visualizations. One of Felton’s favorite graphics from this report is a “topic graph” that plots the use and frequency of specific phrases over time. It started as a tangled mess of curves, but by parsing his conversation data using the Natural Language Toolkit and reducing the topics to flat lines, a coherent picture of his year emerges a few words at a time.

After nine years of fastidious reporting, Felton has an unparalleled perspective on his changing tastes, diets, and interests. Despite a trove of historical data, Felton has found few forward-looking applications for the data. “The purpose of these reports has always been exploration rather than optimization,” he says. “Think of them more as data travelogues than report cards.”…

Felton says it’s relatively easy for companies to make sense of physical data, but properly quantifying other tasks like email is much harder. Email can be a productivity tool or a way to avoid the real work at hand making proper quantification fuzzy. “The next great apps in this space will embrace the grayness of personal data,” says Felton. “They will correlate more dimensions and recognize that life is not merely a continuum of exercising versus not exercising.”

Fascinating project and you can see images from the report at the link.

I like the conclusion: even all of this data about a single year lived requires a level of interpretation that involves skills and nuance. Quantification of some tasks or information could be quite helpful – like health data – but even that requires useful interpretation because numbers don’t speak for themselves. Even infographics need to address this issue: do they help viewers make sense of a year or do they simply operate as flashy graphics?

Americans under 35 have lowest recorded homeownership rate; what does it mean?

The latest Census data shows Americans under 35 now have the lowest recorded homeownership rate for that age group:

In the second quarter of 2014, the rate of homeownership among householders who are under 35 dropped to the lowest number ever reported since the Census Bureau first started recording quarterly homeownership rates 21 years ago.

In a news release published this week, the Census Bureau said that the homeownership rate among householders under 35 was 35.9 percent in the second quarter of 2014. That number was not only lower than any quarterly rate going back to the fourth quarter of 1993 (the first quarterly rate reported) but was also lower than any of the annual homeownership rates for under 35s that the Census Bureau has published since 1982.

However, a Census Bureau official also said that the 35.9 percent homeownership rate for under 35s for the second quarter was not statistically different from the rate for the first quarter of this year (36.2 percent) or the fourth quarter of 2013 (36.8 percent).

These figures on their own could support a number of different arguments about the fate of homeownership in the United States. On one side, those promoting more urban lifestyles could say millennials aren’t buying more homes because they are moving to cities and looking to rent units in order to have more flexibility and take advantage of the urban lifestyle. On the other side, others might note that this data comes 5+ years into the bursting of a housing bubble and that millennials will show more interest in homeownership when the economy picks up. Yet, to make such claims with this data alone would be irresponsible. To be honest, we need a lot more data than this to support any argument and know whether younger Americans do or do not want to own homes in similar numbers to past generations.

See the full Census report regarding 2Q homeownership rates here.

Changing the measurement of poverty leads to 400 million more in poverty around the world

Researchers took a new look at global poverty, developed more specific measures, and found a lot more people living in poverty:

So OPHI reconsidered poverty from a new angle: a measure of what the authors term generally as “deprivations.” They relied on three datasets that do more than capture income: the Demographic and Health Survey, the Multiple Indicators Cluster Survey, and the World Health Survey, each of which measures quality of life indicators. Poverty wasn’t just a vague number anymore, but a snapshot of on-the-ground conditions people were facing.

OPHI then created the new index (the MPI) that collected ten needs beyond “the basics” in three broader categories: nutrition and child mortality under Health; years of schooling and school attendance under Education; and cooking fuel, sanitation, water, electricity, floor, and assets under Living Conditions. If a person is deprived of a third or more of the indicators, he or she would be considered poor under the MPI. And degrees of poverty were measures, too: Did your home lack a roof or did you have no home at all?

Perhaps the MPI’s greatest feature is that it can locate poverty. Where the HPI would just tell you where a country stood in comparison to others, the MPI maps poverty at a more granular level. With poverty mapped in greater detail, aid workers and policy makers have the opportunity to be more targeted in their work.

So what did we find out about poverty now that we can measure it better? Sadly, the world is more impoverished than we previously thought. The HPI has put this figure at 1.2 billion people. But under the MPI’s measurements, it’s 1.6 billion people. More than half of the impoverished population in developing countries lives in South Asia, and another 29 percent in Sub-Saharan Africa. Seventy-one percent of MPI’s poor live in what is considered middle income countries—countries where development and modernization in the face of globalization is in full swing, but some are left behind. Niger is home to the highest concentration of multidimensionally poor, with nearly 90 percent of its population lacking in MPI’s socioeconomic indicators. Most of the poor live in rural areas.

This reminds me of Bill Gates’ suggestion a few years ago that one of the best ways to help address global issues is to set goals and collect better data. Based on this, the world could use more people who can work at collecting and analyzing data. If poverty is at least somewhat relative (beyond the basic needs of absolute poverty) and multidimensional, then defining it is an important ongoing task.

Hard to measure school shootings

It is difficult to decide on how to measure school shootings and gun violence:

What constitutes a school shooting?

That five-word question has no simple answer, a fact underscored by the backlash to an advocacy group’s recent list of school shootings. The list, maintained by Everytown, a group that backs policies to limit gun violence, was updated last week to reflect what it identified as the 74 school shootings since the massacre in Newtown, Conn., a massacre that sparked a national debate over gun control.

Multiple news outlets, including this one, reported on Everytown’s data, prompting a backlash over the broad methodology used. As we wrote in our original post, the group considered any instance of a firearm discharging on school property as a shooting — thus casting a broad net that includes homicides, suicides, accidental discharges and, in a handful of cases, shootings that had no relation to the schools themselves and occurred with no students apparently present.

None of the incidents rise to the level of the massacre that left 27 victims, mostly children, dead in suburban Connecticut roughly 18 months ago, but multiple reviews of the list show how difficult quantifying gun violence can be. Researcher Charles C. Johnson posted a flurry of tweets taking issue with incidents on Everytown’s list. A Hartford Courant review found 52 incidents involving at least one student on a school campus. (We found the same, when considering students or staff.) CNN identified 15 shootings that were similar to the violence in Newtown — in which a minor or adult was actively shooting inside or near a school — while Politifact identified 10.

Clearly, there’s no clean-cut way to quantify gun violence in the nation’s schools, but in the interest of transparency, we’re throwing open our review of the list, based on multiple news reports per incident. For each, we’ve summarized the incident and included casualty data where available.

This is a good example of the problems of conceptualization and operationalization. The idea of a “school shooting” seems obvious until you start looking at a variety of incidents and have to decide whether they hang together as one definable phenomenon. It is interesting here that the Washington Post then goes on to provide more information about each case but doesn’t come down on any side.

So how might this problem be solved? In the academic or scientific world, scholars would debate this through publications, conferences, and public discussions until some consensus (or at least some agreement about the contours of the argument) emerges. This takes time, a lot of thinking, and data analysis. This runs counter to more media or political-driven approaches that want quick, sound bite answers to complex social problems.