Facebook releases big data to researchers outside the company

Posted on February 16, 2020 by legallysociable

Researchers can now access a big dataset of Facebook sharing data:

Social Science One is an effort to get the Holy Grail of data sets into the hands of private researchers. That Holy Grail is Facebook data. Yep, that same unthinkably massive trove that brought us Cambridge Analytica.

In the Foo Camp session, Stanford Law School’s Nate Persily, cohead of Social Science One, said that after 20 months of negotiations, Facebook was finally releasing the data to researchers. (The researchers had thought all of that would be settled in two months.) A Facebook data scientist who worked on the team dedicated to this project beamed in confirmation. Indeed, the official announcement came a few days later…

This is a new chapter in the somewhat tortured history of Facebook data research. The company hires top data scientists, sociologists, and statisticians, but their primary job is not to conduct academic research, it’s to use research to improve Facebook’s products and promote growth. These internal researchers sometimes do publish their findings, but after a disastrous 2014 Facebook study that involved showing users negative posts to see if their mood was affected, the company became super cautious about what it shared publicly. So this week’s data drop really is a big step in transparency, especially since there’s some likelihood that the researchers may discover uncomfortable truths about the way Facebook spreads lies and misinformation.

See the codebook here and the request for proposals to use the data here. According to the RFP, the data involves shared URLs and who interacted with those links:

Through Social Science One, researchers can apply for access to a unique Facebook dataset to study questions related to the effect of social media on democracy. The dataset contains approximately an exabyte (a quintillion bytes, or a billion gigabytes) of raw data from the platform, a total of more than 10 trillion numbers that summarize information about 38 million URLs shared more than 100 times publicly on Facebook (between 1/1/2017 and 7/31/2019). It also includes characteristics of the URLs (such as whether they were fact-checked or flagged by users as hate speech) and the aggregated data concerning the types of people who viewed, shared, liked, reacted to, shared without viewing, and otherwise interacted with these links. This dataset enables social scientists to study some of the most important questions of our time about the effects of social media on democracy and elections with information to which they have never before had access.

Now to see what social scientists can do with the data. The emphasis appears to be on democracy, political posts, and misinformation but given what is shared on Facebook, I imagine there are connections to numerous other topics.

Leave a comment Cancel reply