Using social media data to predict traits about users

Here is a summary of research that uses algorithms and “concepts from psychology and sociology” to uncover traits of social media users through what they make available:

One study in this space, published in 2013 by researchers at the University of Cambridge and their colleagues, gathered data from 60,000 Facebook users and, with their Facebook “likes” alone, predicted a wide range of personal traits. The researchers could predict attributes like a person’s gender, religion, sexual orientation, and substance use (drugs, alcohol, smoking)…

How could liking curly fries be predictive? The reasoning relies on a few insights from sociology. Imagine one of the first people to like the page happened to be smart. Once she liked it, her friends saw it. A social science concept called homophily tells us that people tend to be friends with people like themselves. Smart people tend to be friends with smart people. Liberals are friends with other liberals. Rich people hang out with other rich people…

On the first site, YouAreWhatYouLike, the algorithms will tell you about your personality. This includes openness to new ideas, extraversion and introversion, your emotional stability, your warmth or competitiveness, and your organizational levels.

The second site, Apply Magic Sauce, predicts your politics, relationship status, sexual orientation, gender, and more. You can try it on yourself, but be forewarned that the data is in a machine-readable format. You’ll be able to figure it out, but it’s not as pretty as YouAreWhatYouLike.

These aren’t the only tools that do this. AnalyzeWords leverages linguistics to discover the personality you portray on Twitter. It does not look at the topics you discuss in your tweets, but rather at things like how often you say “I” vs. “we,” how frequently you curse, and how many anxiety-related words you use. The interesting thing about this tool is that you can analyze anyone, not just yourself.

The author then goes on to say that she purges her social media accounts to not include much old content so third parties can’t use the information against them. That is one response. However, before I go do this, I would want to know a few things:

1. Just how good are these predictions? It is one thing to suggest they are 60% accurate but another to say they are 90% accurate.

2. How much data do these algorithms need to make good predictions?

3. How are social media companies responding to such moves? While I’m sure they are doing some of this themselves, what are they planning to do if someone wants to use this data in a harmful way (say, affecting people’s credit score)? Why not set limits for this now rather than after the fact?

Leave a comment