Will Nate Silver ruin his brand with NCAA predictions?

Statistical guru Nate Silver, known for his 2012 election predictions, has been branching out into other areas recently on the New York Times site. Check out his 2013 NCAA predictions. Or look at his 2013 Oscar predictions.

While Silver has a background in sports statistics, I wonder if these forays into new areas with the imprimatur of the New York Times will eventually backfire. In many ways, these new areas have less data than presidential elections and thus, Silver has to step further out on a limb. For example, look at these predictions for the 2013 NCAA bracket:

The top pick for 2013, Louisville, only has a 22.7% chance of winning. If Silver goes with this pick of Louisville, and he does, then he by his own figures will be wrong 77.3% of the time. These are not good odds.

I’m not sure Silver can really win much by predicting the NCAA champion or the Oscars because the odds of making a wrong prediction are higher. What happens if he is wrong a number of times in a row? Will people still listen to him in the same way? What happens when the 2016 presidential election comes along? Of course, Silver could continue to develop better models and make more accurate picks but even this takes attention away from his political predictions.

Planning for the 7 billion person city

Two architects recently won an award for planning for a city that would include all the residents of the world:

This is the premise behind an ambitious research project, called “The City of 7 Billion,” for which the two recently won the $100,000 Latrobe Prize from the American Institute of Architects College of Fellows. With the geo-spatial model Mendis and Hsiang are creating – think a super-enhanced, zoomable Google Earth, Hsiang says – they’re hoping to study the impact of population growth and resource consumption at the scale of the whole world.

Every corner of the planet, they argue, is “urban” in some sense, touched by farming that feeds cities, pollution that comes out of them, industrialization that has made urban centers what they are today. So why not think of the world as a single urban entity?…

Now she and Mendis will be trying to do something similar – sew together disparate data sets, turn them into spatial models, then make those models accessible to the public – with a vastly more complex scenario. They want to connect not just land use with population density, but also income data, carbon dioxide levels, and geographical terrain. Their model of the whole world as one continuous urban terrain could then be used as a predictive tool for planning development into the future.

Hsiang and Mendis are hoping to communicate data and ideas that the political and scientific communities have had a hard time conveying to the public. This may sound like an odd job for architects – visualizing worldwide data about air quality – but Hsiang and Mendis argue that architects are precisely the professionals to do this…

More often, however, they have not been working at the same scale as policy-makers and scientists. “For too long, the architecture profession has been complicit in focusing on buildings and the scale of buildings,” Mendis says. “And I think that’s been detrimental to us.” The City of 7 Billion is an attempt to change that, to involve architects in big-picture questions more often debated by economists and geographers and social scientists.

This sounds like an interesting project on multiple levels:

1. Trying to imagine what a megacity of this size would look like. We are a long way from a megapolis this size yet there are parts of the world that might benefit from such thinking.

2. Putting together data in new ways. This is stretching some of the boundaries of data visualization by putting it in 3-D form.

3. Helping architects get involved in larger conversations about cities.

It will be worth watching where this goes.

Pew reminds us that Twitter users are not representative of the US population

In looking at this story, I was led to a recent Pew study that compared the political leanings of Twitter to the political opinions of the general US population. One takeaway: the two populations are not the same.

The lack of consistent correspondence between Twitter reaction and public opinion is partly a reflection of the fact that those who get news on Twitter – and particularly those who tweet news – are very different demographically from the public.

The overall reach of Twitter is modest. In the Pew Research Center’s 2012 biennial news consumption survey, just 13% of adults said they ever use Twitter or read Twitter messages; only 3% said they regularly or sometimes tweet or retweet news or news headlines on Twitter.

Twitter users are not representative of the public. Most notably, Twitter users are considerably younger than the general public and more likely to be Democrats or lean toward the Democratic Party. In the 2012 news consumption survey, half (50%) of adults who said they posted news on Twitter were younger than 30, compared with 23% of all adults. And 57% of those who posted news on Twitter were either Democrats or leaned Democratic, compared with 46% of the general public. (Another recent Pew Research Center survey provides even more detail on who uses Twitter and other social media.)

In another respect, the Twitter audience also is broader than the sample of a traditional national survey. People under the age of 18 can participate in Twitter conversations, while national surveys are limited to adults 18 and older. Similarly, Twitter conversations also may include those living outside the United States.

Perhaps most important, the Twitter users who choose to share their views on events vary with the topics in the news. Those who tweeted about the California same-sex marriage ruling were likely not the same group as those who tweeted about Obama’s inaugural or Romney’s selection of Paul Ryan.

This leads to me to three thoughts:

1. What does this mean for the archiving of Twitter being undertaken by the Library of Congress? While it is still an interesting data source, Twitter provides a very small slice of U.S. opinion.

2. This is emblematic of larger issues with relying on new technologies to do research: who uses newer technologies is not the same as the U.S. population. This can be corrected for, as a recent article titled “A More Perfect Poll” suggests, and technologies can eventually filter throughout the whole U.S. population. In the meantime, researchers need to be careful about what they conclude.

3. So…what do we do about a comparison of a non-representative sample to a population? Pew seems to admit this:

While this provides an interesting look into how communities of interest respond to different circumstances, it does not reliably correlate with the overall reaction of adults nationwide.

This is an odd way to conclude a statistical report.

Why Public Policy Polling (PPP) should not conduct “goofy polls”

Here is an explanation why the polling firm Public Policy Polling (PPP) conducts “goofy polls”:

But over the past year, PPP has been regularly releasing goofy, sometimes pointless polls about every other month. In early January, one such survey showed that Congress was less popular than traffic jams, France and used-car salesmen. According to their food-centric surveys released this week, Americans clearly prefer Ronald McDonald over Burger King for President; Democrats are more likely to get their chicken at KFC than Chick-fil-A, and Republicans are more apt to order pancakes than waffles. “We’re obviously doing a lot of polling on the key 2014 races,” says Jensen. “That kind of polling is important. We also like to do some fun polls.”

PPP, which has a left-leaning reputation, releases fun polls in part because they’re entertaining but mostly in an attempt to set themselves apart as an approachable polling company. Questions for polls are sometimes crowd-sourced via Twitter. The outfit does informal on-site surveys about what state they should survey next. And when the results of offbeat polls come out, the tidbits have potential to go viral. “We’re not trying to be the next Gallup or trying to be the next Pew,” Jensen says. “We’re really following a completely different model where we’re known for being willing to poll on stuff other people aren’t willing to poll on.” Like whether Republicans are willing to eat sushi (a solid 64% are certainly not).

Which means polls about “Mexican food favorability” are a publicity stunt on some level. Jensen says PPP, which has about 150 clients, gets more business from silly surveys and the ethos it implies than they do cold-calling. One such client was outspoken liberal Bill Maher, who hired PPP to poll for numbers he could use on his HBO show Real Time. That survey, released during the 2012 Republican primaries, found that Republicans were more likely to vote for a gay candidate than an atheist candidate—and that conservative virgins preferred Mitt Romney, while Republicans with 20 or more sexual partners strongly favored Ron Paul.

Jensen argues that the offbeat polls do provide some useful information. One query from the food survey, for instance, asks respondents whether they consider themselves obese: about 20% of men and women said yes, well under the actual American obesity rate of 35.7%.  Information like that could give health crusaders some fodder for, say, crafting public education PSAs. Still, the vast majority of people are only going to use these polls to procrastinate at work: goodness knows it’s hard to resist a “scientific” analysis of partisans’ favorite pizza toppings (Republicans like olives twice as much!).

Here is my problem with this strategy: it is short-sighted and privileges PPP. While polling firms do need to market themselves as there are a number of organizations that conduct national polls, this strategy can harm the whole field. When the average American sees the results of “goofy polls,” is it likely to improve their view of the polling in general? I argue there is already enough suspicion in America about polls and their validity without throwing in polls that don’t tell us as much. This suspicion contributes to lower response rates across the board, a problem for all survey researchers.

In the end, the scientific nature of polling takes a hit when any firm is willing to reduce polling to marketing.

How the Library of Congress will archive and make available all tweets

The Library of Congress announced a few years ago they will archive all tweets. Here is how they plan to store the data and make it available:

Osterberg says the costs associated with the project, in terms of developing the infrastructure to house the tweets, is in the low tens of thousands of dollars. The tweets were offered as a free gift from Twitter, and are being transferred to the Library through a separate company, Gnip, at no cost. Each day tweets are automatically pulled in from Gnip, organized chronologically and scanned to ensure they’re not corrupted. Then the data are stored on two separate tapes which are housed in different parts of the Library for security reasons.

The Library has mostly figured out how to make the archive organized, but usability remains a challenge. A simple query of just the 2006-2010 tweets currently takes about 24 hours. Increasing search speeds to a reasonable level would require purchasing hundreds of servers, which the Library says is financially unfeasible right now. There’s no timetable for when the tweets might become accessible to researchers…

While you can’t yet make a trip to Washington D.C. and have casual perusal of all the world’s tweets, the technology to do exactly that is readily available—for a cost. Gnip, the organization feeding the tweets to the Library, is a social media data company that has exclusive access to the Twitter “firehose,” the never-ending, comprehensive stream of all of our tweets. Companies such as IBM pay for Gnip’s services, which also include access to posts from other social networks like Facebook and Tumblr. The company also works with academics and public policy experts, the type of people likely to make use of a free, government-sponsored Twitter archive when it comes to fruition…

All the researchers agree that Twitter is a powerful tool for sociological study. Soon, if the Library of Congress can make its database fully functional, it’ll also be an easily accessible one. And one day, long after we’ve all sent our final snarky tweet, our messages will live on.

And what will people of the future think when they read all these tweets?

While this could be a really interesting data source (notwithstanding all of the sample selection issues), I find it odd there is no timetable for when it might be more easily searchable. What is the point of collecting all of this information if it can’t be put to use?

Nielsen changes the definition of watching TV to include streaming

When people starting watching TV in new ways, companies have to adjust and collect better data:

The decisions made by the [What Nielsen Measures] committee are not binding but a source at one of the big four networks was ecstatic at the prospect of expanded measurement tools. The networks for years have complained that total viewing of their shows isn’t being captured by traditional ratings measurements. This is a move to correct that.

By September 2013, when the next TV season begins, Nielsen expects to have in place new hardware and software tools in the nearly 23,000 TV homes it samples. Those measurement systems will capture viewership not just from the 75 percent of homes that rely on cable, satellite and over the air broadcasts but also viewing via devices that deliver video from streaming services such as Netflix and Amazon, from so-called over-the-top services and from TV enabled game systems like the X-Box and PlayStation.

While some use of iPads and other tablets that receive broadband in the home will be included in the first phase of measurement improvements, a second phase is envisioned to include such devices in a more comprehensive fashion. The second phase is envisioned to roll out on a slower timetable, according to sources, will the overall goal to attempt to capture video viewing of any kind from any source.

Nielsen is said to have an internal goal of being able to measure video viewing on an iPad by the end of this year, a process in which the company will work closely with its clients.

This is a good example of how operationalization and measurement are not just for scientists. Here, possibly millions of dollars are at stake in advertising. It would be interesting to hear the advertisers’ side of the story; higher numbers could mean they pay more but it would also mean that they can reach bigger audiences.

So can we assume that better measurement means we will find that Americans watch more TV than we currently think?

Argument: statistics can help us understand and enjoy baseball

An editor and writer for Baseball Prospectus argues that we need science and statistics to understand baseball:

Fight it if you like, but baseball has become too complicated to solve without science. Every rotation of every pitch is measured now. Every inch that a baseball travels is measured now. Teams that used to get mocked for using spreadsheets now rely on databases packed with precise location and movement of every player on every play — and those teams are the norm, not the film-inspiring exceptions. This is exciting and it’s terrifying…

I’m not a mathematician and I’m not a scientist. I’m a guy who tries to understand baseball with common sense. In this era, that means embracing advanced metrics that I don’t really understand. That should make me a little uncomfortable, and it does. WAR is a crisscrossed mess of routes leading toward something that, basically, I have to take on faith…

Yet baseball’s front offices, the people in charge of $100 million payrolls and all your hope for the 2013 season, side overwhelmingly with data. For team executives, the basic framework of WAR — measuring players’ total performance against a consistent baseline — is commonplace, used by nearly every front office, according to insiders. The writers who helped guide the creation of WAR over the decades — including Bill James, Sean Smith and Keith Woolner — work for teams now. As James told me, the war over WAR has ceased where it matters. “There’s a practical necessity for measurements like that in a front office that make it irrelevant whether you like them or you don’t.”

Whether you do is up to you and ultimately matters only to you. In the larger perspective, the debate is over, and data won. So fight it if you’d like. But at a certain point, the question in any debate against science is: What are you really fighting and why?

As someone who likes data, I would statistics is just another tool that can help us understand baseball better. It doesn’t have to be an either/or argument, baseball with advanced statistics versus baseball without advanced statistics. Baseball with advanced statistics is a more complete and gets at some of the underlying mechanics of the game rather than the visual cues or the culturally accepted statistics.

While this story is specifically about baseball, I think it also mirrors larger conversations in American society about the use of statistics. Why interrupt people’s common sense understandings of the world with abstract data? Aren’t these new statistics difficult to understand and can’t they also be manipulated? Some of this is true: looking at data can involve seeing things in news ways and there are disagreements about how to define concepts as well as how to collect to interpret data. But, in the end, these statistics can help us better understand the world.

Evaluating the charts and graphics in President Obama’s “enhanced experience” version of the State of the Union

In addition to the speech, President Obama’s State of the Union involved an “enhanced experience” with plenty of charts and graphics. Here are some thoughts about how well this data and information was presented:

But sometimes, even accuracy can be misleading, especially when it comes to graphics and charts. On Tuesday night, President Obama gave his State of the Union address and the White House launched an “enhanced” experience, a multimedia display with video, 107 slides and 27 charts…

Overall, Few said Obama’s team created well-designed charts that presented information “simply, clearly and honestly.”

On a chart about natural gas wells:

“This graph depicting growth in natural gas wells suffers from a problem related to the quantitative scale, specifically the fact that it does not begin at zero. Although it is not always necessary to begin the scale of a line graph at zero, in this case because the graph was shown to the general public, narrowing the scale to begin at 400,000 probably exaggerated people’s perception of the degree in change.”

On a chart about “energy-related CO2 emissions”:

We found that the data behind this chart match up with what the U.S. Energy Information Administration reports in its table of U.S. Macroeconomic Indicators and CO2 Emissions. But the y-axis is too compressed and as a result the chart exaggerates the trend a bit.

On a chart about American troop levels in Afghanistan:

Annotating discrete data points as this chart does can be helpful to tease out the story in a bunch of numbers, but that’s not a replacement for properly labeled axes. And this chart has none.

It seems like the data was correct but it often was put into a compressed context – not surprisingly, the years Obama has been in office or just a few years beforehand. This is a basic thing to keep in mind with charts and graphs: the range on the axes matters and manipulating these can change people’s perceptions of whether there have been sharp changes or not.

Misinterpreting a graph of income in the US by misreading the X-axis categories

Some graphs can be more difficult to interpret, particularly if the categories along one of the axes are not a consistent width. Here is an example: misreading a chart of income in the United States:

“When I was growing up in Canada,” says Jon Evans of Techcrunch, “I was taught that income distribution should and did look like a bell curve, with the middle class being the bulge in the middle. Oh, how naïve my teachers were. This is how income distribution looks in America today.”

file

“That big bulge up above? It’s moving up and to the left. America is well on the way towards having a small, highly skilled and/or highly fortunate elite, with lucrative jobs; a vast underclass with casual, occasional, minimum-wage service work, if they’re lucky; and very little in between.”…

Er, no.  Look closely at those last two brackets.   Now look at the brackets immediately to the right of them? What do you notice?

Probably, you notice the same thing that immediately struck me: the last two brackets cover a much, much wider income band than the rest of the brackets on the graph.

Each bar on that graph represents a $5,000 income band: Under $5,000, $5000 to $9,999, and so forth.  Except for the last two.  The penultimate band is $200,000 to $250,000, which is ten times as wide as the previous band.  And the last bar represents all incomes over $250,000–a group that runs from some law associate who pulled down $251,000 last year, through A-Rod’s $27 million annual salary, all the way to some Silicon Valley superstar who just cashed out the company for a one time windfall of hundreds of millions of dollars.  Unsurprisingly, much wider bands have more people in them than they would if you kept on extrapolating out in $5,000 increments…

To put it another way, the apparent clustering of income along the rich right tail of the distribution is just an artifact of the way that the Census presents the data.  If they kept running through $5,000 brackets all the way out to A-Rod, the spreadsheet would be about a mile long, and there would only be a handful of people in each bracket.  So at the high end, where there are few households, they summarize.

The Census likely has good reasons for reporting these higher-income categories in such a way. First, because there are relatively fewer people in each $5,000 increment, they are trying to not make the graph too wide. Second, I believe the Census topcodes income, meaning that above a certain dollar point, incomes don’t get any higher. This is done to help protect the identity of these respondents who might be easy to pick out of the data otherwise.

But, this is a classic misinterpretation of a graph. As McArdle notes, this is a long-tail graph with very few people at the top end. The graph tries to alert reader to this by also marking some of the notable percentiles; above the $130,000 to $134,999 category, it reads “The top 10 percent reported incomes above $135,000” and above the top two categories, it reads, “approximately 4 percent of households.” Making the right interpretation depends not just on the relative shape of the graph, bell curve or otherwise, but looking closely at the axes and categories.

Trying to ensure more accountability in US News & World Report college ranking data

The US News & World Report college rankings are big business but also a big headache in data collection. The company is looking into ways to ensure more trustworthy data:

A new report from The Washington Post‘s Nick Anderson explores the increasingly common problem, in which universities submit inflated standardized test scores and class rankings for members of their incoming classes to U.S. News, which doesn’t independently verify the information. Tulane University, Bucknell University, Claremont McKenna College, Emory University, and George Washington University have all been implicated in the past year alone. And those are just the schools that got caught:

A survey of 576 college admissions officers conducted by Gallup last summer for the online news outlet Inside Higher Ed found that 91 percent believe other colleges had falsely reported standardized test scores and other admissions data. A few said their own college had done so.

For such a trusted report, the U.S. News rankings don’t have many safeguards ensuring that their data is accurate. Schools self-report these statistics on the honor system, essentially. U.S. News editor Brian Kelly told Inside Higher Ed’s Scott Jaschik, “The integrity of data is important to everybody … I find it incredible to contemplate that institutions based on ethical behavior would be doing this.” But plenty of institutions are doing this, as we noted back in November 2012 when GWU was unranked after being caught submitting juiced stats. 

At this point, U.S. News shouldn’t be surprised by acknowledgment like those from Tulane and Bucknell. It turns out that if you let schools misreport the numbers — especially in a field of fierce academic competition and increasingly budgetary hardship — they’ll take you up on the offer. Kelly could’ve learned that by reading U.S. News‘ own blog, Morse Code. Written by data researcher Bob Morse, almost half of the recent posts have been about fraud. To keep schools more honest, the magazine is considering requiring university officials outside of enrollment offices to sign a statement vouching for submitted numbers. But still, no third party accountability would be in place, and many higher ed experts are already saying that the credibility of the U.S. News college rankings is shot.

Three quick thoughts:

1. With the amount of money involved in the entire process, this should not be a surprise. Colleges want to project the best image they can so having a weakly regulated system (and also a suspect methodology and set of factors to start with) can lead to abuses.

2. If the USNWR rankings can’t be trusted, isn’t there someone who could provide a more honest system? This sounds like an opportunity for someone.

3. I wonder if there are parallels to PED use in baseball. To some degree, it doesn’t matter if lots of schools are gaming the system as long as the perception among schools is that everyone else is doing it. With this perception, it is easier to justify one’s own cheating because colleges need to catch up or compete with each other.