Will Nate Silver ruin his brand with NCAA predictions?

Statistical guru Nate Silver, known for his 2012 election predictions, has been branching out into other areas recently on the New York Times site. Check out his 2013 NCAA predictions. Or look at his 2013 Oscar predictions.

While Silver has a background in sports statistics, I wonder if these forays into new areas with the imprimatur of the New York Times will eventually backfire. In many ways, these new areas have less data than presidential elections and thus, Silver has to step further out on a limb. For example, look at these predictions for the 2013 NCAA bracket:

The top pick for 2013, Louisville, only has a 22.7% chance of winning. If Silver goes with this pick of Louisville, and he does, then he by his own figures will be wrong 77.3% of the time. These are not good odds.

I’m not sure Silver can really win much by predicting the NCAA champion or the Oscars because the odds of making a wrong prediction are higher. What happens if he is wrong a number of times in a row? Will people still listen to him in the same way? What happens when the 2016 presidential election comes along? Of course, Silver could continue to develop better models and make more accurate picks but even this takes attention away from his political predictions.

You can collect lots of Moneyball-type data but it still has to be used well

Another report from the MIT Sloan Sports Analytics Conference provides this useful reminder about statistics and big data:

Politics didn’t come up at the conference, except for a single question to Nate Silver, the FiveThirtyEight election oracle who got his start doing statistical analysis on baseball players. Silver suggested there wasn’t much comparison between the two worlds.

But even if there’s no direct correlation, there was an underlying message I heard consistently throughout the conference that applies to both: Data is an incredibly valuable resource for organizations, but you must be able to communicate its value to stakeholders making decisions — whether that’s in the pursuit of athletes or voters.

And the Obama 2012 campaign successfully put this together. Here is one example:

Data played a major role. There’s perhaps no better example than the constant testing of email subject lines. The performance of the Obama email with the subject line “I will be outspent” earned the campaign an estimated $2.6 million. Had the campaign gone with the lowest-performing subject line, it would have raised $2.2 million less, according to “Inside the Cave,” a detailed report from Republican strategist Patrick Ruffini and the team at Engage.

This is an important reminder about statistics: they still have to be used well and effectively shared with leaders and the public. We are now in a world where more data is available than ever before but this doesn’t necessarily mean life is getting better.

I recently was in a conversation about the value of statistics. I suggested that if colleges and others were able to effectively train the students of today in statistics and how to use them in the real world, we might be better off as a society in a few decades as these students go on to become leaders who can make statistics a regular part of their decision-making. We’ll see if this happens…

Correlations that get at why big cities lean toward Democrats

Richard Florida discusses several reasons, based on correlations, why big cities now so clearly lean toward the Democratic party:

Density played a key role in the metro vote. (To capture it we use a measure we of population-based density, which accounts for the concentration of people in metro). The average Obama metro was more than twice as dense as the average Romney metro, 412 versus 193 people per square mile. With a correlation of .50, density was an even bigger factor than population (where the correlation is .34). The reverse pattern holds for the share of Romney votes; the negative correlation for density (-.51) was significantly higher than that for population (-.33)…

The chart below plots the relationship between a metro’s share of college grads and its share of Obama votes. The line slopes steeply upward showing how the share of Obama votes increase alongside metro density. The share of college grads in a metro is positively correlated with the share of Obama votes (.42) and negatively with the share of Romney votes (-.44)…

The chart above shows the relationship between the share of the creative class and the share of Obama votes across metro areas. The line slopes steeply upward, indicating a considerable positive relationship. The share of creative class workers is positively correlated with the share of Obama votes (.40) and negatively with the share of Romney votes (-.41)…

Republicans may still be the party of the rich, but most of the country’s more-affluent metros lined up squarely in the Obama camp. The correlation between the average wages and salaries of metros and the share of Obama votes is positive (.50) and it is negative for Romney votes (-.51). This makes sense too, as larger metros have greater concentrations of knowledge-based talent and industries and are wealthier to begin with. (The associations we find are even more substantial for metros with more than one million people, with the correlations increasing to .71 for Obama and -.72 for Romney.) This follows the “Red State, Blue State, Rich State, Poor State” pattern identified by Andrew Gelman of Columbia University, who infamously found that while rich voters continue to trend Republican, rich states trend Democratic.

Florida argues this is evidence of class-based differences in American life, specifically, differences between the creative class and those in knowledge industries compared to the rest of the United States.

However, this raises a few questions:

1. The analysis here seems to be done across metropolitan areas while some of these voting patterns break down as we compare cities versus suburbs. For example, there are those who suggest it is really about cities and inner-ring suburbs that vote Democratic while more further flung suburbs and exurbs vote Republican. See earlier posts about the analysis of Joel Kotkin – here and here.

2. Making claims with correlations with tricky. Florida acknowledges this before he rolls out the analysis: “As usual, I point out that correlation points to associations between variables only, not causation.” But, then why stop the analysis at correlations here? Looking at the relationships just between two variables at a time ignores the complex relationships between factors like race, class, location, jobs, and more. Why not quickly run some regressions?

3. If this analysis is correct (and we need more in-depth analysis to check), why are Republicans so bad at appealing to the creative class?

The closer look at how the Obama campaign used big data to wage an intimate and winning campaign

In MIT Technology Review, Sasha Issenberg has a three-part look at how the Obama campaign was effectively able to harness big data. Here are the concluding paragraphs from Part Three:

A few days after the election, as Florida authorities continued to count provisional ballots, a few staff members were directed, as four years before, to remain in Chicago. Their instructions were to produce another post-mortem report summing up the lessons of the past year and a half. The undertaking was called the Legacy Project, a grandiose title inspired by the idea that the innovations of Obama 2012 should be translated not only to the campaign of the next Democratic candidate for president but also to governance. Obama had succeeded in convincing some citizens that a modest adjustment to their behavior would affect, however marginally, the result of an election. Could he make them feel the same way about Congress?

Simas, who had served in the White House before joining the team, marveled at the intimacy of the campaign. Perhaps more than anyone else at headquarters, he appreciated the human aspect of politics. This had been his first presidential election, but before he became a political operative, Simas had been a politician himself, serving on the city council and school board in his hometown of Taunton, Massachusetts. He ran for office by knocking on doors and interacting individually with constituents (or those he hoped would become constituents), trying to track their moods and expectations.

In many respects, analytics had made it possible for the Obama campaign to recapture that style of politics. Though the old guard may have viewed such techniques as a disruptive force in campaigns, they enabled a presidential candidate to view the electorate the way local candidates do: as a collection of people who make up a more perfect union, each of them approachable on his or her terms, their changing levels of support and enthusiasm open to measurement and, thus, to respect. “What that gave us was the ability to run a national presidential campaign the way you’d do a local ward campaign,” Simas says. “You know the people on your block. People have relationships with one another, and you leverage them so you know the way they talk about issues, what they’re discussing at the coffee shop.”

Few events in American life other than a presidential election touch 126 million adults, or even a significant fraction that many, on a single day. Certainly no corporation, no civic institution, and very few government agencies ever do. Obama did so by reducing every American to a series of numbers. Yet those numbers somehow captured the individuality of each voter, and they were not demographic classifications. The scores measured the ability of people to change politics—and to be changed by it.

Combining numbers and a personal appeal made for a winning campaign. Part Two has more on how the Romney campaign watched what the Obama campaign was doing and tried to react and yet couldn’t quite figure it out.

Since this appears to have been the winning formula in 2012, I imagine there will be plenty of others who will try to duplicate it. One way would be to get the Obama campaign database and information and it is not clear who might be able to access that in the future. Another way would be to hire some of the Obama campaign people who made this happen – I imagine they will get some lucrative offers moving forward. A third option would be to try to find another way but this could be tedious, require a lot of resources, and may not come to the same conclusion.

Republicans (and Democrats) need to pay attention to data rather than just spinning a story

Conor Friedersdorf suggests conservatives clearly had their own misinformed echo chambers ahead of this week’s elections:

Before rank-and-file conservatives ask, “What went wrong?”, they should ask themselves a question every bit as important: “Why were we the last to realize that things were going wrong for us?”

Barack Obama just trounced a Republican opponent for the second time. But unlike four years ago, when most conservatives saw it coming, Tuesday’s result was, for them, an unpleasant surprise. So many on the right had predicted a Mitt Romney victory, or even a blowout — Dick Morris, George Will, and Michael Barone all predicted the GOP would break 300 electoral votes. Joe Scarborough scoffed at the notion that the election was anything other than a toss-up. Peggy Noonan insisted that those predicting an Obama victory were ignoring the world around them. Even Karl Rove, supposed political genius, missed the bulls-eye. These voices drove the coverage on Fox News, talk radio, the Drudge Report, and conservative blogs.

Those audiences were misinformed.

Outside the conservative media, the narrative was completely different. Its driving force was Nate Silver, whose performance forecasting Election ’08 gave him credibility as he daily explained why his model showed that President Obama enjoyed a very good chance of being reelected. Other experts echoed his findings. Readers of The New York Times, The Atlantic, and other “mainstream media” sites besides knew the expert predictions, which have been largely born out. The conclusions of experts are not sacrosanct. But Silver’s expertise was always a better bet than relying on ideological hacks like Morris or the anecdotal impressions of Noonan.

But I think Friedersdorf misses the most important point here in the rest of his piece: it isn’t just about Republicans veering off into ideological territory into which many Americans did not want to follow or wasting time on inconsequential issues that did not affect many voters. The misinformation was the result of ignoring or downplaying the data that showed President Obama had a lead in the months leading up to the election. The data predictions from “The Poll Quants” were not wrong, no matter how many conservative pundits wanted to suggest otherwise.

This could lead to bigger questions about what political parties and candidates should do if the data is not in their favor in the days and weeks leading up to an election. Change course and bring up new ideas and positions? This could lead to questions about political expediency and flip-flopping. Double-down on core issues? This might ignore the key things voters care about or reinforce negative impressions. Ignore the data and try to spin the story? It didn’t work this time. Push even harder in the get-out-the-vote ground game? This sounds like the most reasonable option…

Political operative discusses which polls he thought were reliable, unreliable while working for Edwards 2008 campaign

Amidst discussions of whether current polls are accurately weighting their samples for Democrats and Republicans, a former political operative for Al Gore and John Edward talks about how the Edwards campaign used polls:

However, under cross-examination by lead prosecutor David Harbach, Hickman acknowledged sending a series of emails in November and December, and even into January, endorsing or promoting polls that made Edwards look good. Asked about what appeared to be a New York Times/CBS poll released in mid-November showing an effective “three-way tie” in Iowa with Hillary Clinton at 25 percent, Edwards at 23 percent and Obama at 22 percent, Hickman acknowledged he circulated it but insisted he didn’t think it was correct.

“The business I’m in is a business any fool can get into, and a lot can happen. I’m sure there was a poll like that,” the folksy Hickman told jurors when first asked about a poll showing the race tied. “I kept up with every poll that was done, including our own, and there may have been a few that showed them a tie, but… that’s not really what my analysis is. Campaigns are about trajectory, and… there could have been a point at which it was a tie in the sense that we were coming down, and Obama was going up, and Clinton was going up.”

Hickman also indicated that senior campaign staffers knew many of the polls were poorly done and of little value. “We didn’t take these dog and cat and baby-sitter polls seriously,” he said.

Hickman acknowledged that on January 2, 2008, a day before the Iowa caucuses, he sent out a summary of nine post-Christmas Iowa polls showing Edwards in contention in the Hawkeye State. However, he testified two-thirds of them were from firms he considered “ones we typically would not put a lot of credence in.” Hickman put Mason-Dixon, Strategic Vision, Insider Advantage, Zogby and Research 2000 in the “less reputable” group. He also told the court that ARG polls “have a miserable track record.”

Hickman said he considered the Des Moines Register polls, CNN and Los Angeles Times polls more accurate.

This seems like typical politics: an operative is supposed to spin the best news they can about their candidate, even if they don’t think this is the whole story. However, it is fascinating to see his opinion of different polling organizations. I wish he went on to describe why some of these polls were better than others: better samples, more reliable and/or predictive results, they lined up with other reputable polls? At the same time, I think the DrudgeReport’s headline for this story, “Under oath, Edwards pollster admits polls were ‘propaganda,'” is a bit misleading.  Hickman wasn’t disparaging all polls; he was admitting to using some polls that he thought were inaccurate to tell a particular political story.

If we got a bunch of current political operatives in a room, here are questions we could ask that would revealing:

1. Are there certain polls that you all consider to be reliable? (I hope the answer is yes. But I would also guess that each political party thinks certain polls tend to lean in their direction.)

2. What information do you all work with regularly that helps give you a better picture of what is going beyond the polls? In other words, the American public doesn’t get much of an inside view while the campaign is happening beyond a stream of polls reported by the media but the campaigns themselves have more information that matters. How much should the public pay attention to these polls or can they pick up clues from what is really going on elsewhere? (The media seems to like polls but there are other ways to get information.)

3. In the long run, who is helped or harmed by having a lot of polling organizations? Hickman suggests some polls aren’t that worthwhile so if this is the case, should they not be reported to the American public? (Americans can look at a variety of polls; should there be that many to choose from?)

Unfortunately, this story feeds a growing mistrust of polls. Generally, it is not good for social science if 42% of Americans think polls are biased for one candidate or another. On one hand, these 42% may simply not like what the polls are reporting, have little idea how polls work, and simply want their candidate to win (and won’t like the polls until this happens). On the other hand, perceptions matter and decisions about polls should be made on scientific grounds, not on ideological or partisan affections. And, surely this has to play into the finding that only 9% of Americans are willing to respond to telephone surveys.

Editorial: to lower poverty rate in the US, we need to talk about it first

An editorial in the Philadelphia Daily News suggests there is currently a big stumbling block in dealing with record poverty levels in the United States: no one is talking about it.

One argument that has gained currency is that the poor aren’t really poor, because they have refrigerators and cell phones. Here’s another: The worst economic downturn since the Great Depression doesn’t qualify as “circumstances beyond their control.” Instead, people who lose their jobs and can’t find others just aren’t looking hard enough. And the most shocking of all: To punish their parents, it’s OK to let children go hungry and suffer the health and educational ramifications of malnutrition.

That’s how some people think of poverty – if they think about it at all…

Yet politicians of all leanings just don’t want to talk about it, almost certainly taking their cues from the populace at large. In a recent study, the media watchdog group Fairness and Accuracy in Reporting looked at six months of national political coverage and found that poverty was the subject of less than 0.2 percent of the stories – that is, only 17 out of 10,489.

In order to do something about poverty, we have to be able to recognize it. An organization sponsored by the Center for American Progress called “Half in Ten” (www.halfinten.org) has set a goal of halving the U.S. poverty rate in 10 years by putting it back on the national agenda. First step: “updating” Americans’ understanding of poverty, beginning with the way it is calculated. The current method – used for nearly a half-century – multiplies estimated food costs by three, which doesn’t take into account increased expenses such as housing, transportation and child care – and gives a much brighter picture than the actual reality.

Half in Ten is urging Americans to “tweet” the moderators of the presidential debates using the hashtag #talkpoverty to challenge the candidates on how they would reduce poverty in their first 100 days in office.

The modern era: fighting poverty through Twitter.

I’ve noted this issue before; the major political candidates don’t talk about poverty. They may talk about hardship and economic troubles but they tend to stick to middle-class dreams and helping Americans join this aspirational group. According to the New York Times, the word “poverty” was spoken at a rate of 3 per 25,000 words by Democrats and 5 per 25,000 words by Republicans. In contrast, the phrase “middle class” was used at a rate of 47 per 25,000 words by Democrats and 7 per 25,000 words by Republicans.

At the same time, I wonder if Joel Best’s writings about the possible problems with declaring war on social problems, such as poverty, apply here. How do you keep the momentum of a fifty year war going? How do you know when the US has “won” the war on poverty? One advantage of declaring war on a social problem is that it can draw media attention because of the implications of war. Yet, it sounds like the media isn’t paying much attention either.