Obama campaign data mining information for fundraising, voters

Politico reports on how the Obama campaign is using data mining in its quest to win reelection:

Obama for America has already invested millions of dollars in sophisticated Internet messaging, marketing and fundraising efforts that rely on personal data sometimes offered up voluntarily — like posts on a Facebook page— but sometimes not.

And according to a campaign official and former Obama staffer, the campaign’s Chicago-based headquarters has built a centralized digital database of information about millions of potential Obama voters.

It all means Obama is finding it easier than ever to merge offline data, such as voter files and information purchased from data brokers, with online information to target people with messages that may appeal to their personal tastes. Privacy advocates say it’s just the sort of digital snooping that his new privacy project is supposed to discourage…

There’s an added twist for Obama: He’s making these moves at the same moment his administration is pushing the virtues of online privacy, last month proposing a consumer bill of rights to protect it.

This has been brewing for some time: back in July 2011, Ben Smith reported that the Obama campaign was advertising for “Predictive Modeling/Data Mining Scientists and Analysts.”

I really want to ask: what took so long? This is a gold mine for candidates.

I’ll be curious to see how far these hypocrisy charges go. If companies are going to make money off the Internet, don’t they have to have some of these abilities to put information together? Which group do people trust less to have their information: corporations or political parties?

Sociology grad student: “the Internet is a sociologist’s playground”

A sociology graduate student makes an interesting claim: “the Internet is a sociologist’s playground“:

The Internet is a sociologist’s playground, says Scott Golder, a graduate student in sociology at Cornell University. Although sociologists have wanted to study entire societies in fine-grained detail for nearly a century, they have had to rely primarily upon large-scale surveys (which are costly and logistically challenging) or interviews and observations (which provide rich detail, but for small numbers of subjects). Golder hopes that data from the social Web will provide opportunities to observe the detailed activities of millions of people, and he is working to bring that vision to fruition.  The same techniques that make the Web run—providing targeted advertisements and filtering spam—can also provide insights into social life. For example, he has used Twitter archives to examine how people’s moods vary over time, as well as how network structure predicts friendship choices. Golder came to sociology by way of computer science, studying language use in online communities and using the Web as a tool for collecting linguistic data. After completing a B.A. at Harvard and an M.S. at the MIT Media Lab, he spent several years in an industrial research lab before beginning his Ph.D. in sociology at Cornell.

I would think that having a background in computer science would be a big plus for a sociologist today. Lots of people want to study social networking sites like Facebook and work with the data available online. But I wonder if there still aren’t a few issues to overcome before we can really tap this information:

1. Do companies that have a lot of this data, places like Google and Facebook, want to open it up to researchers or would they prefer to keep the data in-house in order to make money?

2. How will Internet users respond to the interest researchers have in studying their online behavior if they are often not thrilled about being tracked by companies?

3. Has the sampling issue been resolved? In other words, one of the problems with web surveys or working with certain websites is that theses users are not representative of the total US population. So while internet activity has increased among the population as a whole, isn’t internet usage, particularly among those who use it most frequently, still skewed in certain directions?

4. Just how much does online activity reveal about offline activity? Do the two worlds overlap so much that this is not an issue or are there important things that you can’t uncover through online activity?

I would think some of these issues could be resolved and the sociologists who can really tap this growing realm will have a valuable head start.

Overblown concern about Google “replacing” or “destroying” our memory

The headlines read: “Google ousts brain,” “Google replaces the brain,” “Here’s how Google search is destroying our memory.” These are all based on a new study:

The Internet is becoming our main source of memory instead of our own brains, a study has concluded.

In the age of Google, our minds are adapting so that we are experts at knowing where to find information even though we don’t recall what it is.

The researchers found that when we want to know something we use the Internet as an ‘external memory’ just as computers use an external hard drive…

‘The Internet has become a primary form of external or transactive memory, where information is stored collectively outside ourselves.’

This an example of “distributed cognition,” the idea that humans use other sources to extend their brain’s capacity. In this case, memory space in the brain may be freed up by relying on Google and computers to store certain information. Instead of “replacing” the brain, Google is extending the brain and helping humans offload certain information that can helpfully be stored elsewhere. Google isn’t the first technology that allows this; so does the printed page. Rather than storing a bunch of arcane and typically unhelpful information in our head, we could look up basic information in a reference book.

Perhaps people are more concerned about Google itself and the idea that a corporation, an organization more interested in profit than our well-being, may be behind changes in our brain.

Indicators that loyalty among family members is up in America

Even though we supposedly live in a disconnected and fragmented age, there are some indicators that suggest Americans feel more loyal toward their families than in the past:

“There’s been a social and economic change that’s actually made us more dependent on family loyalties,” says Stephanie Coontz, author of “Marriage, A History” (Penguin).

“You don’t know your neighbors. It would be crazy to be loyal to your employer in the same way you used to be because your employer’s not going to be loyal to you. All of those things have simultaneously made us want more loyalty — long for more loyalty — and try, I think, to have more loyalty in our personal lives.”

Loyalty itself is difficult to measure, but likely indicators such as family closeness appear to be on the rise. A 2010 Pew Research Center study found that 40 percent of Americans say their family life is closer now than when they were growing up, and only 14 percent say it is less close. Another Pew study showed that the percentage of adults who talked with a parent every day rose to 42 percent in 2005 from 32 percent in 1989.

The family loyalty picture is complex, with Bradford Wilcox, director of the National Marriage Project at the University of Virginia, saying that though couples who marry today are less likely to get divorced than couples that married in the 1970s, more people are forgoing marriage or delaying it.

The article suggests several reasons why people would be feeling more loyal toward their family today: rapid economic and social change, different expectations about family life, and people are entering intimate relationships more cautiously.

There could also be a few other factors at work:

1. I wonder if there is some social desirability bias in answering a question about family closeness. What adult today would say they are doing a worse job in creating family closeness than their parents did? Also, there is a memory issue here: how many current adults can accurately remember or assess the closeness of their family when they were younger? Their current family status is much more immediate.

2. I’m surprised this wasn’t mentioned in the article: it is relatively easier to communicate in families with the advent of email, cell phones, and text messages. However, I wonder if these easier methods of connection mean that people are confusing connected with closeness or if they are indeed one and the same.

Even if loyalty isn’t truly up compared to the “golden era” decades ago (at least in our popular culture we have this image of an era where the nuclear family never let each other down), the perception that loyalty is more important or stronger matters. This is an expectation that many people will bring to relationships and affect their actions.

(A side note: Wilcox and Coontz get interviewed for a ridiculous number of news stories about family life and marriage.)

Sociologist talks about the downside of choosing your own news

A sociologist suggests you may be missing something by only choosing what news you want to read:

It’s in no sense odd to find American academe wrangling over journalism. Dean Starkman of the Columbia Journalism Review and Clay Shirky of New York University have recently been hammering away at each other, seeking to determine whether investigative journalism can only be conducted by highly resourced news machines (like the Guardian’s) or by a more individual, digital-first approach (like… um… the Guardian’s). But what’s sociology got to contribute here?

Plenty, Klinenberg says, outlining the fundamental bargain that underpins newspaper life. You, the reader, want crosswords and cartoons, recipes and TV programme guides. You want all the stuff that journalists serve up with a sigh (because, well, it’s not exactly journalism, is it?). And, in return, as part of the deal, journalism is allowed to have a civic purpose – to report and analyse the workings and frailties of democracy – beyond quick ways to whip up a cottage pie.

That bargain, sealed in print, means you can’t have one without the other. Put your cash on the newsagent’s counter and you get some things you desire and other things, from Cardiff or Chad, that you didn’t know had happened until you turned to page five.

Of course, like any other neat thesis, there are readers and editors who don’t quite fit. But the nature of print – flipping from column to column, noticing stories that intrigue you, naturally expanding your spheres of interest – isn’t “versioning” at all – it’s more eclectic. An iPad or Kindle version works within narrower bounds. A Facebook version is even more selective, tailored to your most immediate demands. And the logical version at the end of this line is utterly simple: no deals, no bargains – just what you want, electronically provided on the basis of past predilection.

This is part of a larger question about the consequences of people only being exposed to certain points of view. Only selecting news that we want to read can be self-reinforcing as then we only seek out certain kinds of stories, limiting our view of the world.

I wonder, though, about blaming this issue on the medium. How much does having a newspaper in hand really increase the odds that someone will read something that didn’t plan to? Can’t people simply pick out parts of the newspaper that they want to read as well? Further, was there ever really a “golden age” where average citizens always tried to engage with alternative points of view? I would guess not though that doesn’t mean it isn’t a worthwhile ideal. We need citizens (and journalists) who can understand our complex world which transcends simply “left” or “right” understandings. Perhaps the Internet makes this easier in some ways but I would guess the Internet could be changed to meet these challenges or people’s behaviors could be altered.

This reminds of an argument I was reading last night. People could argue, rightly, that all media viewpoints are biased in some way. However, this doesn’t mean that we can just throw out all news sources and say they don’t have something of value. What should be consistent across different sources are facts and then there can be disagreement about the interpretation of these facts. Of course, what is considered “fact” may be up for grabs as well – see the recent debate over Politifact’s “Lie of the Year.”

Why a small minority of Americans don’t use Facebook

The New York Times has a piece looking at why some Americans don’t use Facebook:

As Facebook prepares for a much-anticipated public offering, the company is eager to show off its momentum by building on its huge membership: more than 800 million active users around the world, Facebook says, and roughly 200 million in the United States, or two-thirds of the population…

Many of the holdouts mention concerns about privacy. Those who study social networking say this issue boils down to trust. Amanda Lenhart, who directs research on teenagers, children and families at the Pew Internet and American Life Project, said that people who use Facebook tend to have “a general sense of trust in others and trust in institutions.” She added: “Some people make the decision not to use it because they are afraid of what might happen.”…

Facebook executives say they don’t expect everyone in the country to sign up. Instead they are working on ways to keep current users on the site longer, which gives the company more chances to show them ads. And the company’s biggest growth is now in places like Asia and Latin America, where there might actually be people who have not yet heard of Facebook…

And whether there is haranguing involved or not, the rebels say their no-Facebook status tends to be a hot topic of conversation — much as a decision not to own a television might have been in an earlier media era…

Some quick thoughts:

1. This is a relatively small percentage of Americans who don’t use Facebook. If 200 million Americans are on Facebook, that is the vast majority of people 13 years old and above. Roughly 15-20% of Americans are not eligible for Facebook (older 2000 figures here). The comparison made in the article is to the percent of people without cell phones which is roughly 16%.

1a. Because of its general ubiquity, perhaps it would be more interesting then to differentiate between people who it frequently (multiple times a day?) versus those who check infrequently (say once a week or less).

1b. Is this the activity Americans most share in common perhaps beside watching TV?

2. Privacy issues don’t seem to bother most Facebook users. Even though there may be little revolts when Facebook changes its privacy policy or makes a mistake, this isn’t driving people away in large numbers. And, as I’ve said before, if you want to remain private you should probably stay off the Internet all together. Another warning for non-users: Facebook may already have information about you anyway.

3. It would be interesting to see figures of how long people stay on Facebook. And speaking of getting people to see advertisements, this small study used eye tracking to see what catches people’s attention when they look at profiles.

3a. If Facebook does need to keep users’ attention, is there a line between always having to change things versus helping people feel comfortable with the site? I say this as we await the Timeline change and the inevitable negative responses.

4. As the article hints at by briefly looking at the pressure non-users get from Facebook users, there is a whole set of social norms that have arisen around the use of Facebook.

Journalists need a better measure for when something has “taken over the web”

I’ve noticed that there are a growing number of online news stories about what is popular online. While many websites need to feed on this buzz, journalists need some better measures of how popular things are on the Internet. Take, for instance, this story posted on Yahoo:

This video from the California State University, Northridge campus has ignited controversy across the Internet this morning. In the video, reportedly taken during finals week, a female student loses her temper with her fellow students, accusing them of being disruptive.

Exactly how much “controversy across the Internet” has erupted? Phrases like this are not unusual; we’re commonly told that a particular story or video or meme has spread across the Internet so we need to know about it. But we have little idea about how popular anything really is.

I’ve noted before my dislike for journalists using the size of Facebook groups as a measure of popularity. So what can be used? We need numbers that can be at least put in a context and compared to other numbers. For example, the number of YouTube views can be compared to the views for other videos. Page views and hits (which have their own problems) at least provide some information. Journalists could do a quick search of Google news to get some idea of how many news sources have picked up on a story. We can know how many times something has been retweeted on Twitter.

None of these numbers are perfect. By themselves, they are meaningless. But broad and vague assertions that we need to read about something simply because lots of people on the Internet have seen it are silly. Give us some idea of how popular something really is, where it started, and who has responded to it so far. Show us some trend and put it in some context.

Required for political participation: “digital skills”

Here is an argument that African-Americans and Latinos could participate more in American politics if they had more “digital skills”:

Could the key to increasing civic engagement among Latinos and African Americans be computer classes?   A growing body of research is linking Internet use, particularly social network use, and increased social capital and civic engagement.  A new reportfrom the MaCarthur foundation finds that Facebook use is correlated with increased interest in and participation in politics. Scholars like Northwestern Sociologist Esther Hargatti [sic] speak eloquently about the information gap between rich and poor online.  This gap is less about access to technology and more about developing the skills to harness the technology for political and social gain.  The ability to do information searches, send text messages, tweet, share content and other on-line skills is a central element in becoming what Evegny Morozov calls a “digital renegade” rather than a “digital captive.”

The key to using the Web in democracy-enhancing ways is acquiring digital skills.  While this concept has been measured in lots of ways, the presence of digital skills can be measured by the level of autonomy the user has, the number of access points a user has to get online, the amount of experience a user has with different types of online tools, etc.

This should be an area of interest to a lot of people: how social factors, such as race, education levels, location, and other forces affect online use. “Digital skills” are not simply traits that everyone picks up on their own. It requires a certain level of exposure, time, and resources that not all have. See a video clip of Hargittai talking about this.

I wonder how much arguments like this are behind recent government efforts to provide cheap or free broadband to poorer US residents. Here is part of the statement from the head of the FCC:

“There is a growing divide between the digital-haves and have-nots. No Less than one-third of the poorest Americans have adopted broadband, while 90%+ of the richest have adopted it. Low-income Americans, rural Americans, seniors, and minorities disproportionately find themselves on the wrong side of the digital divide and excluded from the $8 trillion dollar global Internet economy.”

As I’ve asked before, how close are we to declaring Internet access an essential human right?

What is “The Big Data Boom”on the Internet good for?

The Internet is a giant source of ready-to-use data:

Today businesses can measure their activities and customer relationships with unprecedented precision. As a result, they are awash with data. This is particularly evident in the digital economy, where clickstream data give precisely targeted and real-time insights into consumer behavior…

Much of this information is generated for free, by computers, and sits unused, at least initially. A few years after installing a large enterprise resource planning system, it is common for companies to purchase a “business intelligence” module to try to make use of the flood of data that they now have on their operations. As Ron Kohavi at Microsoft memorably put it, objective, fine-grained data are replacing HiPPOs (Highest Paid Person’s Opinions) as the basis for decision-making at more and more companies.

The wealth of data also makes it easy to run experiments:

Consider two “born-digital” companies, Amazon and Google. A central part of Amazon’s research strategy is a program of “A-B” experiments where it develops two versions of its website and offers them to matched samples of customers. Using this method, Amazon might test a new recommendation engine for books, a new service feature, a different check-out process, or simply a different layout or design. Amazon sometimes gets sufficient data within just a few hours to see a statistically significant difference…

According to Google economist Hal Varian, his company is running on the order of 100-200 experiments on any given day, as they test new products and services, new algorithms and alternative designs. An iterative review process aggregates findings and frequently leads to further rounds of more targeted experimentation.

This sounds like a social scientist’s dream – if we could get our hands on the data.

My big question about all of this data is this: what should be done with it? This article, and others I’ve seen, have said that it will transform business. If this is just a way for businesses to become more knowledgeable, more efficient, and ultimately, more profitable, is this enough? Occasionally, we hear of things like discovering and/or tracking epidemics by looking at search queries or tools like the “mechanical turk” to crowdsource small but needed work. On the whole, does the data from the Internet advance human flourishing or concentrate some benefits in the hands of a few or even hinder flourishing? Does this data give us insights into health and medicine, international relations, and social interactions or does it primarily give entrepreneurs and established companies the chance to make more money? Are these questions that anyone really asks or cares about?

One of the new research frontiers: studying dating online

There are now a number of academics studying online dating sites as they allow insights into relationship formation that are difficult to observe elsewhere in large numbers:

Like contemporary Margaret Meads, these scholars have gathered data from dating sites like Match.com, OkCupid and Yahoo! Personals to study attraction, trust, deception — even the role of race and politics in prospective romance…

“There is relatively little data on dating, and most of what was out there in the literature about mate selection and relationship formation is based on U.S. Census data,” said Gerald A. Mendelsohn, a professor in the psychology department at the University of California, Berkeley…

Andrew T. Fiore, a data scientist at Facebook and a former visiting assistant professor at Michigan State University, said that unlike laboratory studies, “online dating provides an ecologically valid or true-to-life context for examining the risks, uncertainties and rewards of initiating real relationships with real people at an unprecedented scale.”…

Of the romantic partnerships formed in the United States between 2007 and 2009, 21 percent of heterosexual couples and 61 percent of same-sex couples met online, according to a study by Michael J. Rosenfeld, an associate professor of sociology at Stanford. (Scholars said that most studies using online dating data are about heterosexuals, because they make up more of the population.)

The rest of the article has some research findings about appearance, race, and political ideology derived from studies of online dating site members.

Researchers will go wherever the research subjects are so if the people are expanding their dating pools online, that is where the research to go. It would be interesting to hear if any of these researchers have received pushback from people within their own fields who scoff at online dating sites or ask them to demonstrate the worthiness of studying online behavior.