The closer look at how the Obama campaign used big data to wage an intimate and winning campaign

In MIT Technology Review, Sasha Issenberg has a three-part look at how the Obama campaign was effectively able to harness big data. Here are the concluding paragraphs from Part Three:

A few days after the election, as Florida authorities continued to count provisional ballots, a few staff members were directed, as four years before, to remain in Chicago. Their instructions were to produce another post-mortem report summing up the lessons of the past year and a half. The undertaking was called the Legacy Project, a grandiose title inspired by the idea that the innovations of Obama 2012 should be translated not only to the campaign of the next Democratic candidate for president but also to governance. Obama had succeeded in convincing some citizens that a modest adjustment to their behavior would affect, however marginally, the result of an election. Could he make them feel the same way about Congress?

Simas, who had served in the White House before joining the team, marveled at the intimacy of the campaign. Perhaps more than anyone else at headquarters, he appreciated the human aspect of politics. This had been his first presidential election, but before he became a political operative, Simas had been a politician himself, serving on the city council and school board in his hometown of Taunton, Massachusetts. He ran for office by knocking on doors and interacting individually with constituents (or those he hoped would become constituents), trying to track their moods and expectations.

In many respects, analytics had made it possible for the Obama campaign to recapture that style of politics. Though the old guard may have viewed such techniques as a disruptive force in campaigns, they enabled a presidential candidate to view the electorate the way local candidates do: as a collection of people who make up a more perfect union, each of them approachable on his or her terms, their changing levels of support and enthusiasm open to measurement and, thus, to respect. “What that gave us was the ability to run a national presidential campaign the way you’d do a local ward campaign,” Simas says. “You know the people on your block. People have relationships with one another, and you leverage them so you know the way they talk about issues, what they’re discussing at the coffee shop.”

Few events in American life other than a presidential election touch 126 million adults, or even a significant fraction that many, on a single day. Certainly no corporation, no civic institution, and very few government agencies ever do. Obama did so by reducing every American to a series of numbers. Yet those numbers somehow captured the individuality of each voter, and they were not demographic classifications. The scores measured the ability of people to change politics—and to be changed by it.

Combining numbers and a personal appeal made for a winning campaign. Part Two has more on how the Romney campaign watched what the Obama campaign was doing and tried to react and yet couldn’t quite figure it out.

Since this appears to have been the winning formula in 2012, I imagine there will be plenty of others who will try to duplicate it. One way would be to get the Obama campaign database and information and it is not clear who might be able to access that in the future. Another way would be to hire some of the Obama campaign people who made this happen – I imagine they will get some lucrative offers moving forward. A third option would be to try to find another way but this could be tedious, require a lot of resources, and may not come to the same conclusion.

Sears hopes Moneyball addition to its board can help revive the company

Here is an odd mixing of the data, sports, and business worlds: Sears recently named Paul Podesta to its board.

Paul DePodesta, one of the heroes of Michael Lewis’ “Moneyball: The Art of Winning an Unfair Game,” a great 2003 baseball book (and later a movie) about the 2002 A’s that’s more about business and epistemology than baseball, has been named to the board of Hoffman Estates-based Sears Holdings Corp.

To be sure, he’s an unconventional choice for the parent of Sears and Kmart. But Chairman Edward Lampert is thinking outside the box score, welcoming the New York Mets’ vice president of player development and amateur scouting into his clubhouse…

“What Paul DePodesta … did to bring analytics into the world of baseball is absolutely parallel to what needs to happen — and is happening — in retail,” said Greg Girard, program director of merchandising strategies and retail analytics for Framingham, Mass.-based IDC Retail Insights.

“It’s a big cultural change, but that’s something a board member can effect,” Girard said. “And he’s got street cred to take it down to the line of business guys who need to change, who need to bring analytics and analysis into retail decisions.”…

“Analytics has been something folks in retail have talked about for quite some time, but they’re redoubling their efforts now,” Girard said. “Drowning in data and not knowing what data’s relevant, which data to retain and for how long, is the No. 1 challenge retailers are having as they move into what we call Big Data.”

Fascinating. People like Podesta are credited with starting a revolution in sports by developing new statistics and then using that information to outwit the market. For example, Podesta and a host of others before him (possibly with Bill James at the beginning), found that certain traits like on-base percentage were undervalued and teams, like the small-market Oakland Athletics, could build decent teams without overpaying for the biggest free agents. Of course, once other teams caught on to this idea, on-base percentage was no longer undervalued. The Boston Red Sox, one of the biggest spending baseball teams, picked up this idea and paid handsomely for such skills and went on to win two World Series championships. So teams now have to look at other undervalued areas. One recent area that Major League Baseball shut down was spending more on overseas talent and draft picks to build up a farm system quickly. These ideas are now spreading to other sports as some NBA teams are making use of such data and new precise data will soon be collected with soccer players while they are on the pitch.

The same thought process could apply to business. If so, the process might look like this: find new ways to measure retail activity or hone in on less understood data that is out there. Then maximize a response to these lesser-known concepts and move around competitors. When they start to catch on, keep innovating and stay ahead a step or two. Sears could use a lot of this moving forward as they have struggled in recent years. Even if Podesta is able to identify trends others have not, he would still have to convince a board and company to change course.

It will be interesting to see how Podesta comes out of this. If Sears continues to lose ground, how much of that will rub off on him? If there is a turnaround, how much credit would he get?

Claim: 90% of information ever created by humans was created in the last two years

An article on big data makes a claim about how much information humans have created in the last two years:

In the last two years, humans have created 90% of all information ever created by our species. If our data output used to be a sprinkler, it is now a firehose that’s only getting stronger, and it is revealing information about our relationships, health, and undiscovered trends in society that are just beginning to be understood.

This is quite a bit of data. But a few points in a response:

1. I assume this refers only to recorded data. While there are more people on earth than before, humans are expressive creatures and have been for a long time.

2. This article could be interpreted by some to mean that we need to pay more attention to online privacy but I would guess much of this information is volunteered. Think of Facebook: users voluntarily submit information their friends and Facebook can access. Or blogs: people voluntarily put together content.

3. This claim also suggests we need better ways to sort through and make sense of all this data. How can the average Internet user put it all this data together in a meaningful way? We are simply awash in information and I wonder how many people, particularly younger people, know how to make sense of all that is out there.

4. Of course, having all of this information out there doesn’t necessarily mean it is meaningful or worthwhile.

Argument: Tom Wolfe’s “sociological novel” about Miami doesn’t match reality

A magazine editor from Miami argues Tom Wolfe’s latest “sociological novel” Back to Blood doesn’t tell the more complex story of what is going on today in that city:

TOM WOLFE has often declared that journalistic truth is far stranger — and narratively juicier — than fiction, a refrain he’s returned to while promoting his latest sociological novel, the Miami- focused “Back to Blood.” With cultural eyes turning to Miami for this week’s Art Basel fair, and on the heels of a presidential election in which South Florida was once again in the national spotlight, “Back to Blood” would seem a perfectly timed prism.

Yet Mr. Wolfe would have done well to better heed his own advice. The flesh-and-blood reality not only contradicts much of his fictional take, it flips the enduring conventional wisdom. Miami is no longer simply the northernmost part of Latin America, or, as some have snarked, a place filled with folks who’ve been out in the sun too long.

For Mr. Wolfe, the city remains defined by bitter ethnic divisions and steered by la lucha: the Cuban-American community’s — make that el exilio’s — frothing-at-the-mouth fixation on the Castro regime across the Florida Straits. The radio format whose beats Miami moves to isn’t Top 40, rap or even salsa, but all Fidel, all the time. It’s a crude portrait, established in the ’80s, reinforced by the spring 2000 telenovela starring Elián González, hammered home in the media by that fall’s Bush v. Gore drama and replayed with the same script every four years since.

Yet the latest data hardly depicts a monolithic Cuban-exile community marching in ideological lock step. Exit polls conducted by Bendixen & Amandi International revealed that 44 percent of Miami’s Cuban-Americans voted to re-elect President Obama last month, despite a Mitt Romney TV ad attempting to link the president with Mr. Castro. The result was not only a record high for a Democratic presidential candidate, it was also a 12 percentage-point jump over 2008.

Can a novel, even a sociological one, capture all of the nuances of a big city? Or, is a novel more about capturing a spirit or the way these complexities influence a few characters? While I do enjoy fictional works, this is why I tend to gravitate toward larger-scale studies about bigger patterns. One story or a few stories can explore nuance and more details. However, it is hard to know how much these smaller stories are representative of a larger whole. In Wolfe’s case, is his book a fair-minded view of what is taking place all across Miami or does he pick up on a few fault lines  and exceptional events?

While browsing in a bookstore the other day, I did notice an interesting book that was trying to bridge this gap: The Human Face of Big Data. On one hand, our world is becoming one where large datasets with millions of data points are the norm. With this, it may be harder and harder for novels to capture all of the patterns and trends. Yet, we don’t want to lose perspective on how this data and the resulting policies and actions affect real people.

The real question after the 2012 presidential election: who gets Obama’s database?

President Obama has plenty to deal with in his second term but plenty of people want an answer to this question: who will be given access to the campaign’s database?

Democrats are now pressing to expand and redeploy the most sophisticated voter list in American political history, beginning with next year’s gubernatorial races in Virginia and New Jersey and extending to campaigns for years to come. The prospect already has some Republicans worried…

The database consists of voting records and political donation histories bolstered by vast amounts of personal but publicly available consumer data, say campaign officials and others familiar with the operation, which was capable of recording hundreds of fields for each voter.

Campaign workers added far more detail through a broad range of voter contacts — in person, on the phone, over e-mail or through visits to the campaign’s Web site. Those who used its Facebook app, for example, had their files updated with lists of their Facebook friends along with scores measuring the intensity of those relationships and whether they lived in swing states. If their last names seemed Hispanic, a key target group for the campaign, the database recorded that, too…

To maintain their advantage, Democrats say they must guard against the propensity of political data to deteriorate in off years, when funding and attention dwindles, while navigating the inevitable intra-party squabbles over who gets access now that the unifying forces of a billion-dollar presidential campaign are gone.

The Obama campaign spent countless hours developing this database and will not let it go lightly. I imagine this could become a more common legacy for winning politicians than getting things done while in office: passing on valuable data about voters and supporters to other candidates. If a winning candidate had good information, others will want to build on the same information. I don’t see much mention of one way to solve this issue: let political candidates or campaigns pay for the information!

What about the flip side: will anyone use or want the information collected by the Romney campaign? Would new candidates prefer to start over or are there important pieces of data that can be salvaged from a losing campaign?

Another call for the need for theory when working with big data

Big data is not just about allowing researchers to look at really large samples or lots of information at once. It also requires the use of theory and asking new kinds of questions:

Like many other researchers, sociologist and Microsoft researcher Duncan Watts performs experiments using Mechanical Turk, an online marketplace that allows users to pay others to complete tasks. Used largely to fill in gaps in applications where human intelligence is required, social scientists are increasingly turning to the platform to test their hypotheses…

This is a point political forecaster and author Nate Silver discusses in his recent book The Signal and the Noise. After discussing economic forecasters who simply gather as much data as possible and then make inferences without respect for theory, he writes:

This kind of statement is becoming more common in the age of Big Data. Who needs theory when you have so much information? But this is categorically the wrong attitude to take toward forecasting, especially in a field like economics, where the data is so noisy. Statistical inferences are much stronger when backed up by theory or at least some deeper thinking about their root causes…

The value of big data isn’t simply in the answers it provides, but rather in the questions it suggests that we ask.

This follows a similar recent argument made on the Harvard Business Review website.

I like the emphasis here on the new kinds of questions that might be possible with big data. There are a couple of ways these could happen:

1. Uniquely large datasets might allow for different comparisons, particularly among smaller groups, that are more difficult to look at even with nationally representative samples.

2. The speed at which the experiments can be conducted through means like Amazon’s Mechanical Turk means more can be done more quickly. Additionally, I wonder if this could help alleviate some of the replication issues that pop up with scientific research.

3. Instead of having to be constrained by data limitations, big data might give researchers creative space to think on a larger scale and more outside of the box.

Of course, lots of topics are not well-suited for looking at through big data but such information does offer unique opportunities for researchers and theories.

Facebook owes a debt to sociological research on social networks

At a recent conference, two Facebook employees discussed how their product was based on sociological research on social networks:

Two of Facebook’s data scientists were in Cambridge today presenting on big data at EmTech, the conference by MIT Technology Review, and discussing the science behind the network. Eytan Bakshy and Andrew Fiore each have a PhD and have held research or lecture positions at top universities. Their job is to find value in Facebook’s massive collection of data.

And their presentation underscored, unsurprisingly, the academic roots of their work. Fiore, for instance, cited the seminal 1973 sociology paper on networks, The Strength of Weak Ties, to explain Facebook’s research showing that we’re more likely to share links from our close acquaintances, but given the volume of those weaker connections, in aggregate weak ties matter more. As Facebook attempts to extract value from its users, it’s standing on the shoulders of social science to do it. It may seem banal to point out, but its insights are dependent on a rich history of academic research…

These data scientists were referencing an article written by sociologist Mark Granovetter that has to be one of the most cited sociology articles of all time. I just looked up the 1973 piece in the database Sociological Abstracts and the site says the article has been cited 4,251 times. Granovetter helped kick off a exploding body of research on social networks and how they affect different areas of life.

Some of the other conclusions in this article are interesting as well. The writer suggests the pipeline between academia and Facebook should be open both ways as both the company and scholars would benefit from Facebook data:

Select academics do frequently get granted access to data at companies like Facebook to conduct and publish research (though typically not the datasets), and some researchers manage to glean public data by scraping the social network. But not all researchers are satisfied. After tweeting about the issue, I heard from Ben Zhao, an associate professor of Computer Science at UC Santa Barbara, who has done research on Facebook. “I think many of us in academia are disappointed with the lack of effort to engage from FB,” he told me over email.

The research mentioned above and presented at EmTech was published earlier this year, by Facebook, on Facebook. Which is great. But it points to the power that Facebook, Google, and others now have in the research environment. They have all the data, and they can afford to hire top tier researchers to work in-house. And yet it’s important that the insights now being generated about how people live and communicate be shared with and verified by the academic community.

This is the world of big data and who has access to the more proprietary data will be very important. More broadly, it should also lead to discussions about whether corporations should be able to sit on such potentially valuable data and primarily pursue profits or whether they should make it more available so we can learn more about humanity at large. I know which side many academics would be on…

Argument: still need thinking even with big data

Justin Fox argues that the rise of big data doesn’t mean we can abandon thinking about data and relationships between variables:

Big data, it has been said, is making science obsolete. No longer do we need theories of genetics or linguistics or sociology, Wired editor Chris Anderson wrote in a manifesto four years ago: “With enough data, the numbers speak for themselves.”…

There are echoes here of a centuries-old debate, unleashed in the 1600s by protoscientist Sir Francis Bacon, over whether deduction from first principles or induction from observed reality is the best way to get at truth. In the 1930s, philosopher Karl Popper proposed a synthesis, in which the only scientific approach was to formulate hypotheses (using deduction, induction, or both) that were falsifiable. That is, they generated predictions that — if they failed to pan out — disproved the hypothesis.

Actual scientific practice is more complicated than that. But the element of hypothesis/prediction remains important, not just to science but to the pursuit of knowledge in general. We humans are quite capable of coming up with stories to explain just about anything after the fact. It’s only by trying to come up with our stories beforehand, then testing them, that we can reliably learn the lessons of our experiences — and our data. In the big-data era, those hypotheses can often be bare-bones and fleeting, but they’re still always there, whether we acknowledge them or not.

“The numbers have no way of speaking for themselves,” political forecaster Nate Silver writes, in response to Chris Anderson, near the beginning of his wonderful new doorstopper of a book, The Signal and the Noise: Why So Many Predictions Fail — But Some Don’t. “We speak for them.”

These days, finding and examining data is much easier than before but it is still necessary to interpret what these numbers mean. Observing relationships between variables doesn’t necessarily tell us something valuable. We also want to know why variables are related and this is where hypotheses come in. Careful hypothesis testing means we can rule out spurious associations, other variables that may be leading to the observed relationship, and look for the influence of one variable on another when controlling for other factors (the essence of regression) or looking at more complex models where we can see how a variety of models affect each other at the same time.

Also, at the opposite end of the scientific process from the hypotheses, utilizing findings when creating and implementing policies will also require thinking. Once we have established that relationships likely exist, it takes even more work to respond to this in useful and effective ways.