Many researchers would like to get their hands on SNS/Facebook profile data but one well-known dataset put together by Harvard researchers has come under fire:
But today the data-sharing venture has collapsed. The Facebook archive is more like plutonium than gold—its contents yanked offline, its future release uncertain, its creators scolded by some scholars for downloading the profiles without students’ knowledge and for failing to protect their privacy. Those students have been identified as Harvard College’s Class of 2009…
The Harvard sociologists argue that the data pulled from students’ Facebook profiles could lead to great scientific benefits, and that substantial efforts have been made to protect the students. Jason Kaufman, the project’s principal investigator and a research fellow at Harvard’s Berkman Center for Internet & Society, points out that data were redacted to minimize the risk of identification. No student seems to have suffered any harm. Mr. Kaufman accuses his critics of acting like “academic paparazzi.”…
The Facebook project began to unravel in 2008, when a privacy scholar at the University of Wisconsin at Milwaukee, Michael Zimmer, showed that the “anonymous” data of Mr. Kaufman and his colleagues could be cracked to identify the source as Harvard undergraduates…
But that boon brings new pitfalls. Researchers must navigate the shifting privacy standards of social networks and their users. And the committees set up to protect research subjects—institutional review boards, or IRB’s—lack experience with Web-based research, Mr. Zimmer says. Most tend to focus on evaluating biomedical studies or traditional, survey-based social science. He has pointed to the Harvard case in urging the federal government to do more to educate IRB’s about Web research.
It sounds like academics, IRBs, and granting agencies still need to figure out acceptable standards for collecting such data. But I’m not surprised that the primary issue that arose had to do with identifying individual users and their profiles as this is a common issue when researchers ask for or collect personal information. Additionally, this dataset intersects with a lot of open concerns about Internet privacy. Perhaps some IRBs could take on the task of leading the way for academics and other researchers who want to get their hands on such data.
It is interesting that these concerns arose because of the growing interest in sharing datasets. The Harvard researchers and IRB allowed the research to take place so I wonder if all of this would have ever happened if the dataset didn’t have to be shared where others could then raise issues.
I understand that the researchers wanted to collect the profiles quietly but why not ask for permission? How many Harvard students would have turned them down? I think most college students are quite aware of what can happen with their profile data and they take care of the issue on the front end by making selections about what they display. The researchers could then offer some protections in terms of anonymity and who would have access to the data. Or what about having interviews with students who would then be asked to load their profile and walk the researcher through what they have put online and why it is there?