SurveyMonkey made good 2014 election predictions based on experimental web polls

Here is an overview of some experimental work at SurveyMonkey in doing political polls ahead of the 2014 elections:

For this project, SurveyMonkey took a somewhat different approach. They did not draw participants from a pre-recruited panel. Instead, they solicited respondents from the millions of people that complete SurveyMonkey’s “do it yourself” surveys every day run by their customers for companies, schools and community organizations. At the very end of these customer surveys, they asked respondents if they could answer additional questions to “help us predict the 2014 elections.” That process yielded over 130,000 completed interviews across the 45 states with contested races for Senate or governor.

SurveyMonkey tabulated the results for all adult respondents in each state after weighting to match Census estimates for gender, age, education and race for adults — a relatively simple approach analogous to the way most pollsters weight random sample telephone polls. SurveyMonkey provided HuffPollster with results for each contest tabulated among all respondents as well as among subgroups of self-identified registered voters and among “likely voters — those who said they had either already voted or were absolutely certain or very likely to vote (full results are published here).

“We sliced the data by these traditional cuts so we could easily compare them with other surveys,” explains Jon Cohen, SurveyMonkey’s vice president of survey research, “but there’s growing evidence that we shouldn’t necessarily use voters’ own assessments of whether or not they’ll vote.” In future elections, Cohen adds, they plan “to dig in and build more sophisticated models that leverage the particular attributes of the data we collect.” (In a blog post published separately on Thursday, Cohen adds more detail about how the surveys were conducted).

The results are relatively straightforward. The full SurveyMonkey samples did very well in forecasting winners, showing the ultimate victor ahead in all 36 Senate races and missing in just three contests for Governor (Connecticut, Florida and Maryland)…

The more impressive finding is the way the SurveyMonkey samples outperformed the estimates produced by HuffPost Pollster’s poll tracking model. Our models, which are essentially averages of public polls, were based on all available surveys and calibrated to corresponded to results from the non-partisan polls that had performed well in previous elections. SurveyMonkey’s full samples in each state showed virtually no bias, on average. By comparison, the Pollster models overstated the Democrats’ margins against Republican candidates by an average 4 percent. And while SurveyMonkey’s margins were off in individual contests, the spread of those errors was slightly smaller than the spread of those for the Pollster averages (as indicated by the total error, the average of the absolute values of the error on the Democrat vs Republican margins).

The general concerns with web surveys involve obtaining a representative sample, either because it is difficult to identify the particular respondents who would meet the appropriate demographics or the survey is open to everyone. But, SurveyMonkey was able to produce good predictions for this past election cycle. Was it because they had (a) large enough samples that their data was a better approximation of the general population (they were able to reach a large number of people who use their services or (b) their weighting was particularly good?

The real test of this will be when a major organization, particularly a media outlet, solely utilizes web polls ahead of a major election. Given these positive results, perhaps we will see this in 2016. Yet, I imagine there may be some kinks to work out of the system or some organizations would only be willing to do that if they paired the web data with more traditional forms of polling.