After case of fraud, researchers discuss others means of “misusing research data”

The news that a prominent Dutch social psychologist published fraudulent work has pushed other researchers to talk about other forms of “misusing research data”:

Even before the Stapel case broke, a flurry of articles had begun appearing this fall that pointed to supposed systemic flaws in the way psychologists handle data. But one methodological expert, Eric-Jan Wagenmakers, of the University of Amsterdam, added a sociological twist to the statistical debate: Psychology, he argued in a recent blog post and an interview, has become addicted to surprising, counterintuitive findings that catch the news media’s eye, and that trend is warping the field…

In September, in comments quoted by the statistician Andrew Gelman on his blog, Mr. Wagenmakers wrote: “The field of social psychology has become very competitive, and high-impact publications are only possible for results that are really surprising. Unfortunately, most surprising hypotheses are wrong. That is, unless you test them against data you’ve created yourself.”…

To show just how easy it is to get a nonsensical but “statistically significant” result, three scholars, in an article in November’s Psychological Science titled “False-Positive Psychology,” first showed that listening to a children’s song made test subjects feel older. Nothing too controversial there.

Then they “demonstrated” that listening to the Beatles’ “When I’m 64” made the test subjects literally younger, relative to when they listened to a control song. Crucially, the study followed all the rules for reporting on an experimental study. What the researchers omitted, as they went on to explain in the rest of the paper, was just how many variables they poked and prodded before sheer chance threw up a headline-making result—a clearly false headline-making result.

If the pressure is great to publish (and it certainly is), then there have to be some countermeasures to limit unethical research practices. Here are a few ideas:

1. Giving more people access to the data. In this way, people could check up on other people’s published findings. But if the fraudulent studies are already published, perhaps this is too late.

2. Having more people have oversight over the project along the way. This doesn’t necessarily have to be a bureaucratic board but only having one researcher looking at the data and doing the analysis (such as in the Stapel case) means that there is more opportunity for an individual to twist the data. This could be an argument for collaborative data.

3. Could there be more space within disciplines and journals to discuss the research project? While papers tend to have very formal hypotheses, there is a lot of messy work that goes into these but very little room to discuss how the researchers arrived at them.

4. Decrease the value of media attention. I don’t know how to deal with this one. What researcher doesn’t want to have more people read their research?

5. Have a better educated media so that they don’t report so many inconsequential and shocking studies. We need more people like Malcolm Gladwell who look at a broad swath of research and summarize it rather than dozens of reports grabbing onto small studies. This is the classic issue with nutrition reporting: eggs are great! A new study says they are terrible! A third says they are great for pregnant women and no one else! We rarely get overviews of this research or real questions about the value of all this research. We just get: “a study proved this oddity today…”

6. Resist data mining. Atheoretical correlations don’t help much. Let theories guide statistical models.

7. Have more space to publish negative findings. This would help researchers feel less pressure to come up with positive results.

Claim: “Facebook knows when you’ll break up”

There is an interesting chart going around that is based on Facebook data and claims to show when people are more prone to break-up. Here is a quick description of the chart:

British journalist and graphic designer David McCandless, who specializes in showcasing data in visual ways, compiled the chart. He showed off the graphic at a TED conference last July in Oxford, England.

In the talk, McCandless said he and a colleague scraped 10,000 Facebook status updates for the phrases “breakup” and “broken up.”

They found two big spikes on the calendar for breakups. The first was after Valentine’s Day — that holiday has a way of defining relationships, for better or worse — and in the weeks leading up to spring break. Maybe spring fever makes people restless, or maybe college students just don’t want to be tied down when they’re partying in Cancun.

Potentially interesting findings and it is an interesting way to present this data. But when you consider how the data was collected, perhaps it isn’t so great. A few thoughts on the subject:

1. The best way to figure this out would be to convince Facebook to let you have the data for relationship status changes.

2. Searching for the word “breakup” and “broken up” might catch some, or perhaps even many ended relationships, but not all. Does everyone include these words when talking about ending a relationship?

3. Are 10,000 status updates a representative sample of all Facebook statuses?

4. Is there a lag time involved in reporting these changes? Monday, for example is the most popular day for announcing break-ups, not necessarily for break-ups occurring on that day. Do people immediately run to Facebook to tell the world that they have ended a relationship?

5. Does everyone initially “register” and then “unregister” a relationship on Facebook anyway?

The more I think about it, it is a big claim to make that “Facebook knows when you are going to break up” based on this data mining exercise.

Race as a lesser factor in forming friendships on Facebook

A new study in the American Journal of Sociology finds that a shared racial identity was less important than several other factors when making friends on Facebook:

“Sociologists have long maintained that race is the strongest predictor of whether two Americans will socialize,” said Andreas Wimmer, the study’s lead author and a sociologist at UCLA…

In fact, the strongest attraction turned out to be plain, old-fashioned social pressure. For the average student, the tendency to reciprocate a friendly overture proved to be seven times stronger than the attraction of a shared racial background, the researchers found…

Other mechanisms that proved stronger than same-race preference included having attended an elite prep school (twice as strong), hailing from a state with a particularly distinctive identity such as Illinois or Hawaii (up to two-and-a-half times stronger) and sharing an ethnic background (up to three times stronger).

Even such routine facts of college life as sharing a major or a dorm often proved at least as strong, if not stronger, than race in drawing together potential friends, the researchers found.

Interesting findings – perhaps Facebook is a new world or younger generations don’t pay as much attention to race.

Additionally, it is interesting to read about the methodology of the study which took place at a school where 97% of students had Facebook profiles and the sociologists measured friendships in terms of photo tagging (and not who were actually listed as “friends”).

A couple of questions I have: is behavior on Facebook and choosing friends reflective of actual social patterns in the real world? Is there a selection issue going on here  – not all students or people of this age use Facebook so are college students who use Facebook already more likely to form cross-racial friendships?

Emerging businesses looking for social scientists who can data mine and use statistics

The Economic Times of India contains an interview with Prabhakar Raghavan, chief scientist for Yahoo! and head of their labs. Raghavan talks about their studies of social networking and social influence. Then Raghavan was asked about the people undertaking these studies:

What is the percentage of social scientists in Yahoo! Labs who anchor such work ?

They constitute around 10% of our people. We are interested in social scientists who can work on data mining. But in most colleges, the sociology department doesn’t teach data mining and the statistics department does not offer sociology. That’s why emerging businesses face a serious dearth of such social scientists.

A reminder that all sorts of businesses are looking for sociology students who are well-versed in statistics (and data mining). Since many students don’t think sociology and statistics naturally go together, it is up to colleges (and sociology statistics instructors) to help them put it together. Sociology may often be billed as a discipline that will help students understand, analyze, and change the world but one often needs to be able to work with and analyze data in these efforts.

This interview is also a reminder that social scientist degree holders are not just relegated to a career in academia.