Don’t see social media as representative of full populations

This should be obvious but computer scientists remind us that social media users are not representative populations:

One of the major problems with sites like Twitter, Pinterest or Facebook is ‘population bias’ where platforms are populated by a very narrow section of society.

Latest figures on Twitter suggest that just five per cent of over 65s use the platform compared with 35 per cent for those aged 18-29. Similarly far more men use the social networking site than women.

Instagram has a particular appeal to younger adults, urban dwellers, and non-whites.

In contrast, the picture-posting site Pinterest is dominated by females aged between 25 and 34. LinkedIn is especially popular among graduates and internet users in higher income households.

Although Facebook is popular across a diverse mix of demographic groups scientists warn that postings can be skewed because there is no ‘dislike’ button. There are also more women using Facebook than men, 76 per cent of female internet users use the site compared with 66 per cent of males.

Who does the data from social media represent? The people who use social media who, as pointed out above, tend to skew younger across the board and have other differences based on the service. Just because people are willing to put information out there doesn’t mean that it is a widely shared perspective, even if a Twitter account has millions of followers or a Facebook group has a lot of likes. Until we have a world where everyone participates in social media in similar ways and makes much of the same information public, we need to be careful about social media samples.

ESPN pushing unscientific “NFL Hazing Survey”

On Sportscenter last night as well as on their website, ESPN last night was pushing a survey of 72 NFL players regarding the recent locker room troubles involving the Miami Dolphins. The problem: the survey is unscientific, something they mentioned at the beginning of the TV reports. The online story includes a similar disclaimer at the beginning of the third paragraph:

But in an unscientific survey conducted by team reporters for’s NFL Nation over two days this week, Incognito does not have the same level of support from some of his peers. Three players participated from each team surveyed, with 72 players in all asked three questions. The players taking part were granted anonymity.

If the survey is unscientific, why do they then spend time discussing the results? If they admit upfront that it is unscientific, what exactly could the viewer/reader learn from the data? It is good that they mentioned the unscientific sample but their own statement suggests we shouldn’t put much stock in what they say next.

Argument: we could have skewed survey results because we ignore prisoners

Several sociologists suggest American survey results may be off because they tend to ignore prisoners:

“We’re missing 1% of the population,” said Becky Pettit, a University of Washington sociologist and author of the book, “Invisible Men.” “People might say, ‘That’s not a big deal.’ “But it is for some groups, she writes — particularly young black men. And for young black men, especially those without a high-school diploma, official statistics paint a rosier picture than reality on factors such as employment and voter turnout.

“Because many surveys skip institutionalized populations, and because we incarcerate lots of people, especially young black men with low levels of education, certain statistics can look rosier than if we included” prisoners in surveys, said Jason Schnittker, a sociologist at the University of Pennsylvania. “Whether you regard the impact as ‘massive’ depends on your perspective. The problem of incarceration tends to get swept under the rug in lots of different ways, rendering the issue invisible.”

Further commentary in the article suggests sociologists and others, like the Census Bureau, are split on whether they think including prisoners in surveys is necessary.

Based on this discussion here, I wonder if there is another issue here: is getting slightly better survey results through picking up 1% of the population going to significantly affect results and policy decisions? If not, some would conclude it is not worth the effort. But, Petit argues some statistics could change a lot:

Among the generally accepted ideas about African-American young-male progress over the last three decades that Becky Pettit, a University of Washington sociologist, questions in her book “Invisible Men”: that the high-school dropout rate has dropped precipitously; that employment rates for young high-school dropouts have stopped falling; and that the voter-turnout rate has gone up.

For example, without adjusting for prisoners, the high-school completion gap between white and black men has fallen by more than 50% since 1980, says Prof. Pettit. After adjusting, she says, the gap has barely closed and has been constant since the late 1980s. “Given the data available, I’m very confident that if we include inmates” in more surveys, “the trends are quite different than we would otherwise have known,” she says…

For instance, commonly accepted numbers show that the turnout rate among black male high-school dropouts age 20 to 34 surged between 1980 and 2008, to the point where about one in three were voting in presidential races. Prof. Pettit says her research indicates that instead the rate was flat, at around one in five, even after the surge in interest in voting among many young black Americans with Barack Obama in the 2008 race.

It will be interesting to see how this plays out.

Pew finds that landline-only surveys are biased toward Republicans

Polling techniques have become more complicated in recent years with the introduction of cell phones. In the past, researchers could reasonably assume most US residents could be accessed through a landline. However, Pew now suggests there may be a political bias in surveys that only access people though landlines:

Across three Pew Research polls conducted in fall 2010 — conducted among 5,216 likely voters, including 1,712 interviewed on cell phones — the GOP held a lead that was on average 5.1 percentage points larger in the landline sample than in the combined landline and cell phone sample…

The difference in estimates produced by landline and dual frame samples is a consequence not only of the inclusion of the cell phone-only voters who are missed by landline surveys, but also of those with both landline and cell phones — so called dual users — who are reached by cell phone. Dual users reached on their cell phone differ demographically and attitudinally from those reached on their landline phone. They are younger, more likely to be black or Hispanic, less likely to be college graduates, less conservative and more Democratic in their vote preference than dual users reached by landline…

Cell phones pose a particular challenge for getting accurate estimates of young people’s vote preferences and related political opinions and behavior. Young people are difficult to reach by landline phone, both because many have no landline and because of their lifestyles. In Pew Research Center surveys this year about twice as many interviews with people younger than age 30 are conducted by cell phone than by landline, despite the fact that Pew Research samples include twice as many landlines as cell phones.

This seems to make sense: those who have cell phones and don’t have landlines are likely to be different than those who are reached by landlines.

A few questions that I have: does this issue exist in all phone surveys today (and this article suggests there was a sizable differences between landline people and cell phone people in five of six surveys)? Have other polling firms had similar findings? If Pew now has some ideas about the extent of this issue, is the proper long-term response to call more cell phones or to weight the results more toward cell phone users?

One possible response would be to include multiple methods for more surveys. This might include samples of landline respondents, cell phone respondents, and web respondents. While this is more costly and time-consuming, research firms could then triangulate results.