The sociology of literature and looking for data and insights in the margins of books

As a big reader, I was interested to see this review of research built on data about readers left behind in books:

Price’s work perches at the leading edge of a growing body of investigations into the history of reading. The field draws from many others, including book history and bibliography, literary criticism and social history, and communication studies. It looks backward to the pre-Gutenberg era, back to the clay tablets and scrolls of ancient civilizations, and forward to current debates about how technology is changing the way we read. Although much of the relevant research has centered on Anglo-American culture of the last three or four centuries, the field has expanded its purview, as scholars uncover the hidden reading histories of cultures many used to dismiss as mostly oral.

It’s a tricky business. A bibliographer works with hard physical evidence—a manuscript, a printed book, a copy of the Times of London. A scholar seeking to pin down the readers of the past often has to read between the lines. Marginalia can be a gold mine of information about a book’s owners and readers, but it’s rare. “Most of the time, most readers historically didn’t, and still don’t, write in their books,” Price explains.

But even a book’s apparent lack of use can be read as evidence. “The John F. Kennedy Library here in Boston owns a copy of Ulysses whose pages—other than a few at the very beginning and very end—are completely uncut,” she says. “This tells us something about the owner of the copy—who happens to be Ernest Hemingway.”…

Since Reading the Romance, the ethnography of reading has taken off among scholars. Radway points to Forgotten Readers, Elizabeth McHenry’s study of African-American literary societies, Ellen Gruber Garvey’s Writing With Scissors, about scrapbooking, and David Henkin’s City Reading, about signage in the urban environment, as strong examples. “People have become very creative about trying to figure out how groups of readers interact with the text as it’s embodied in various forms,” she says.

I have wondered in recent years why more sociologists don’t take up the subject of reading. It seems crucial for understanding the development of modern societies as information moved from a highly regulated environment to a diffuse distribution through books, newspapers, and other printed materials.

I’ve enjoyed the work of sociologist Wendy Griswold who studies reading. I’ve used a few of her pieces in class. Here are some of her fascinating works in the “sociology of literature” that I recommend:

1. Bearing Witness published in 2000. Griswold examines the reading culture in Nigeria and why novels, a common genre in Western society, aren’t prevalent in Nigeria. The short version of the story: it takes a lot of work for a society to be at a level where novels can be easily produced and read.

2. “American Character and the American Novel: An Expansion of Reflection Theory in the Sociology of Literature.” American Journal of Sociology 86(4), 1981. Griswold compares American and European novels in the late 1800s and early 1900s and finds the differences in their content is due more to copyright law than “national characters.”

3. With Terry McDonnell and Nathan Wright. “Reading and the Reading Class in the Twenty-First Century.” Annual Review of Sociology 31, 2005. Here is the abstract:

Sociological research on reading, which formerly focused on literacy, now conceptualizes reading as a social practice. This review examines the current state of knowledge on (a) who reads, i.e., the demographic characteristics of readers; (b) how they read, i.e., reading as a form of social practice; (c) how reading relates to electronic media, especially television and the Internet; and (d) the future of reading. We conclude that a reading class is emerging, restricted in size but disproportionate in influence, and that the Internet is facilitating this development.

Some fascinating stuff about the social forces influencing reading in today’s world.

4. With Nathan Wright. “Wired and Well Read.” In Society Online: The Internet in Context, 2004. If I remember correctly, Griswold and Wright argue the Internet doesn’t compete with reading; rather it enhances reading as those who read before the Internet use the Internet to read more.

Social psychologist on quest to find researchers who falsify data

The latest Atlantic magazine includes a short piece about a social psychologist who is out to catch other researchers who falsify data. Here is part of the story:

Simonsohn initially targeted not flagrant dishonesty, but loose methodology. In a paper called “False-Positive Psychology,” published in the prestigious journal Psychological Science, he and two colleagues—Leif Nelson, a professor at the University of California at Berkeley, and Wharton’s Joseph Simmons—showed that psychologists could all but guarantee an interesting research finding if they were creative enough with their statistics and procedures.

The three social psychologists set up a test experiment, then played by current academic methodologies and widely permissible statistical rules. By going on what amounted to a fishing expedition (that is, by recording many, many variables but reporting only the results that came out to their liking); by failing to establish in advance the number of human subjects in an experiment; and by analyzing the data as they went, so they could end the experiment when the results suited them, they produced a howler of a result, a truly absurd finding. They then ran a series of computer simulations using other experimental data to show that these methods could increase the odds of a false-positive result—a statistical fluke, basically—to nearly two-thirds.

Just as Simonsohn was thinking about how to follow up on the paper, he came across an article that seemed too good to be true. In it, Lawrence Sanna, a professor who’d recently moved from the University of North Carolina to the University of Michigan, claimed to have found that people with a physically high vantage point—a concert stage instead of an orchestra pit—feel and act more “pro-socially.” (He measured sociability partly by, of all things, someone’s willingness to force fellow research subjects to consume painfully spicy hot sauce.) The size of the effect Sanna reported was “out-of-this-world strong, gravity strong—just super-strong,” Simonsohn told me over Chinese food (heavy on the hot sauce) at a restaurant around the corner from his office. As he read the paper, something else struck him, too: the data didn’t seem to vary as widely as you’d expect real-world results to. Imagine a study that calculated male height: if the average man were 5-foot?10, you wouldn’t expect that in every group of male subjects, the average man would always be precisely 5-foot-10. Yet this was exactly the sort of unlikely pattern Simonsohn detected in Sanna’s data…

Simonsohn stressed that there’s a world of difference between data techniques that generate false positives, and fraud, but he said some academic psychologists have, until recently, been dangerously indifferent to both. Outright fraud is probably rare. Data manipulation is undoubtedly more common—and surely extends to other subjects dependent on statistical study, including biomedicine. Worse, sloppy statistics are “like steroids in baseball”: Throughout the affected fields, researchers who are too intellectually honest to use these tricks will publish less, and may perish. Meanwhile, the less fastidious flourish.

The current research may just provide incentives for researchers to cut corners and end up with false results. Publishing is incredibly important for the career of an academic and there is little systematic oversight of a researcher’s data. I’ve written before about ways that data could be made more open but it would take some work to put these ideas into practice.

What I wouldn’t want to happen is have people read a story like this and conclude that fields like social psychology have nothing to offer because who knows how many of the studies might be flawed. I also wonder about the vigilante edge to this story – it makes a journalistic piece to tell about a social psychologist who is battling his own field but this isn’t how science should work. Simonsohn should be joined by others who should also be concerned by these potential issues. Of course, there may not be many incentives to pursue this work as it might invite criticism from inside and outside the discipline.

Using GIS to study Gettysburg, the Holocaust, and the American iron industry

Smithsonian takes a look at a historian who uses GIS to get a new perspective on important historical events:

Her principal tool is geographic information systems, or GIS, a name for computer programs that incorporate such data as satellite imagery, paper maps and statistics. Knowles makes GIS sound simple: “It’s a computer software that allows you to map and analyze any information that has a location attached.” But watching her navigate GIS and other applications, it quickly becomes obvious that this isn’t your father’s geography…

What emerges, in the end, is a “map” that’s not just color-coded and crammed with data, but dynamic rather than static—a layered re-creation that Knowles likens to looking at the past through 3-D glasses. The image shifts, changing with a few keystrokes to answer the questions Knowles asks. In this instance, she wants to know what commanders could see of the battlefield on the second day at Gettysburg. A red dot denotes General Lee’s vantage point from the top of the Lutheran Seminary. His field of vision shows as clear ground, with blind spots shaded in deep indigo. Knowles has even factored in the extra inches of sightline afforded by Lee’s boots. “We can’t account for the haze and smoke of battle in GIS, though in theory you could with gaming software,” she says…

Though she’s now been ensconced at Middlebury for a decade, Knowles continues to push boundaries. Her current project is mapping the Holocaust, in collaboration with the U.S. Holocaust Memorial Museum and a team of international scholars. Previously, most maps of the Holocaust simply located sites such as death camps and ghettos. Knowles and her colleagues have used GIS to create a “geography of oppression,” including maps of the growth of concentration camps and the movement of Nazi death squads that accompanied the German Army into the Soviet Union…

Aware of these pitfalls, Knowles is about to publish a book that uses GIS in the service of an overarching historical narrative. Mastering Iron, due out in January, follows the American iron industry from 1800 to 1868. Though the subject matter may not sound as grabby as the Holocaust or Gettysburg, Knowles has blended geographical analysis with more traditional sources to challenge conventional wisdom about the development of American industry.

Sounds pretty interesting. Having detailed geographic data can change one’s perspective. But there are two things that need to happen first before researchers can take advantage of such information:

1. Using GIS well requires a lot of training and then being able to find the right data for the analysis.

2. Using geographic data like this requires a change in mindset from the idea that geography is just a background variable. In sociology, analysis often controls for some geographic variation but doesn’t often consider the location or space as the primary factor.

While GIS is a hot method right now, I think these two issues will hold it back from being widely used for a while.

Using plagiarism detection software to examine anti-Muslim bias in post-9/11 news coverage

A new sociological study suggests mainstream media sources tended to rely on the rhetoric of certain anti-Muslim groups after 9/11:

“The vast majority of organisations competing to shape public discourse about Islam after the September 11 attacks delivered pro-Muslim messages, yet my study shows that journalists were so captivated by a small group of fringe organisations that they came to be perceived as mainstream,” the paper’s author, University of North Carolina assistant professor of sociology Christopher Bail, told Wired.co.uk…

Bail and his team used plagiarism detection software to compare 1,084 press releases produced by 120 different organisations with more than 50,000 television transcripts and newspaper articles produced between 2001 and 2008. The software picked up damning similarities between the releases and stories from news outlets including the New York Times, USA Today, the Washington Times, CBS News, CNN and Fox News Channel.

“We learned the American media almost completely ignored public condemnations of terrorist events by prominent Muslim organisations in the United States,” Bail told Wired.co.uk. “Inattention to these condemnations, combined with the emotional warnings of anti-fringe organisations, has created a very distorted representation of the community of advocacy organisations, think tanks, and religious groups competing to shape the representation of Islam in the American public sphere.”…

Bail’s paper, published in the American Sociological Review, is part of a wider study which will investigate how the influence of these fringe groups has spread beyond media and in to the real world, where doors have been opened to elite conservative social circles and conservative think tanks — the first steps to influencing public policy and national opinion. Bail touched upon this in the current study after analysing publicly available information on the organisations’ membership, which revealed troubling crossovers between fringe and mainstream organisations.

Four quick thoughts:

1. It sounds like there could be some importance influence of social networks. These fringe groups may be on the edges of public discourse but they have connections or means to which to reach more mainstream media sources. How much of this reporting is built on previous personal connections?

2. This sounds like a clever use of plagiarism software. Such software is intended to catch students in using published material incorrectly but it can also be used to track common quotes, phrases, and narratives.

3. In general, how much does the media today rely on press releases and reports from mainstream or fringe groups without interviews, fact-checking, and sorting through all the information?

4. Would a similar study involving elite liberal social circles and think tanks find similar things?

Did all American adults shop on Thanksgiving weekend?

The Weekly Standard takes a look at some figures on Thanksgiving weekend shopping as reported by the National Retail Federation:

“A record 247 million shoppers visited stores and websites in the post-Thanksgiving Black Friday weekend this year, up 9% from 226 million last year, according to a survey by the National Retail Federation released Sunday,” the CNN reports reads. The headline reads: “247 million shoppers visited stores and websites Black Friday weekend.”

This would seem to mean, according to these statistics, that basically all Americans over the age of 14 went shopping this past weekend…

That means, if you subtract those who are too young to shop, 0-14 year olds, from the total U.S. population, there are 247,518,325 people in this country. The number of people CNN reports who went shopping this past weekend…

CNN’s numbers, however, include those who visited “websites.” The numbers [are?] so loose it could even include news website or the same person visiting multiple shopping websites.

Even if there is some double-counting in this data (and tracking across websites is difficult to do), these figures suggest a large majority of Americans went shopping after Thanksgiving. I’ve written before about the difficulty in getting 90% of Americans to agree about something but perhaps we could add the value of Black Friday shopping to the list. These figures also may add to the idea that shopping is the favorite sport of Americans.

Report calls for more study of how “kids navigate social networks”

A new report suggests we don’t know much about how kids use social networks and thus, we need more research:

A recent report from the Joan Ganz Cooney Center, Kids Online: A new research agenda for understanding social networking forums, has identified that we don’t actually know enough about how pre-teens use online social networking. The researchers, Dr. Sarah Grime and Dr. Deborah Fields, have done a good job in helping us recognize that younger children are engaged in a range of different ways with online social networks, but that our knowledge and understanding of what that means and how it impacts on their lives is pretty much underdone. GeekDads, of course, will have thoughts about how and why our children are playing and engaging with technology and networks in the ways they do, but this doesn’t give the people who make the rules and set the policy agendas the big picture that they need.

Essentially, Kids Online is a research report that calls for more research into children’s use of social networks. But the report does demonstrate very clearly why this is required. And at the rate that technology is changing and advancing, we need to work cleverly if we are to have the type of data and analysis that we need as parents to guide our decision making around technology and our children. We are all out there trying our best to facilitate healthy, dynamic, educational and exciting experiences for our children when it comes to tech, but there are not enough people exploring what that looks like. As the report says:

“Research on Internet use in the home has consistently demonstrated that family dynamics play a crucial role in children’s and parents’ activities and experiences online. We need further research on the role of parental limits, rules, and restrictions on children’s social networking as well as how families, siblings, peers, and schools influence children’s online social networking.”

I would go further: we need more research of how people of all ages navigate social networks. This doesn’t mean just looking at what activities users participate in online, how often they update information, or how many or what kinds of friends they have. These pieces of information give an outline of social network site usage. However, we need more comprehensive views how exactly social interaction online works, develops, and interacts in feedback loops with the offline and online worlds.

Let me give an example. Suppose an eleven year old joins Facebook. What happens then? Sure, they gain friends and develop a profile but how does this change and develop over the first days, weeks, and months? How does the eleven year old describe the process of social interaction? How do their friends, online and offline, describe this interaction? Where do they learn how to act and not act on Facebook? Do the social networks online overlap completely with offline networks and if so or if not, how does this affect the offline network? How does the eleven year old start seeing all social interaction differently? Does it change their interaction patterns for years to come or can they somewhat compartmentalize the Facebook experience?

This sort of research would take a lot of time and would be difficult to do with large groups. To do it well, a researcher would have two options: an ethnographic approach or to gain access to the keys to someone’s Facebook account to be able to observe everything that happens. Of course, Facebook itself could provide this information…

The four cultural camps of American parenting

A sociologist argues there are four cultural parenting camps in the United States:

The Faithful, who make up 20 percent of American parents and are largely white and middle class, believe strongly that “God’s timeless truths” about sex, marriage, and life remain as true today as they have always been. They seek to defend these truths in the broader culture and, failing that, aim to “buffer themselves from progressive currents enough that their families will remain faithful to their traditions.” Their most important parenting goal is “raising children to reflect God’s will and purpose.”…

The Engaged Progressives, who make up 21 percent of American parents and are whiter, better educated, and more affluent than the population as a whole, march to a very different beat than the Faithful, at least ideologically. They steer clear of organized religion, believe strongly in the virtues of personal freedom, choice, and tolerance, and seek to form their children into independent-minded adults. But these individualistic values are also tempered by a commitment among progressive parents to the “golden rule” and the values that go along with this rule: honesty, openness, empathy, and compassion for the vulnerable. Their cultural commitments point them in a Blue direction (82 percent reported they would not vote for the Republican presidential nominee).

Ironically, whatever their ideological differences with the Faithful, Engaged Progressives live lives that look surprisingly like their ideological opposites. Although they have fewer children (2.46) than the Faithful, they are almost as married (80 percent are married), about as likely to have stay-at-home-mothers when preschool children are in the home as are the Faithful (58 percent compared to 65 percent), and they also highly engaged parents, enjoying—for instance—more meals with their children than the average parent. So, in pursuit of progressive ideals, Engaged Progressives rely on largely neotraditional strategies: namely, marriage and an intensive parenting style.

The same cannot be said about the other two cultural camps of American parents detailed in the report: “the Detached” and “the American Dreamers”, who make up, respectively, 19 and 27 percent of American parents. Although a slight majority of the Detached are married (67 percent), this largely white, largely downscale group of parents feel incapable or unable to exert much of an influence on their children’s lives. They spend comparatively little time interacting with their children, do not eat daily with their parents, are disconnected from the religious and civic fabric of their communities, and instead allow the television and other outside influences to set the cultural agenda for their children. Indeed, Bowman contends that the Detached parents “lack the vision, vitality, certainty, and self-confidence required to embrace any agenda” for their children. Not surprisingly, this camp has little interest in or involvement with politics.

By contrast, the American Dreamers—who are disproportionately working class and minority—have high hopes for their children. Politically, they are divided, with black and Hispanic Dreamers tilting Democratic, and white Dreamers titling Republican. They believe strongly in education, their children are optimistic about their educational prospects, and they want their children to make good on the American Dream. But given that marriage is fragile in this camp (only 64 percent are married), they have less income and education than most parents, and they are more likely to hail from communities with anemic religious and civic institutions, it’s not clear that American Dreamers can make good on the big dreams they have for their children.

A few thoughts about this:

1. Read the PDF report here and see more about the Institute for Advanced Studies in Culture at the University of Virginia here.

2. Sociologist Annette Lareau suggested in Unequal Childhoods that social class led to two parenting styles: concerted cultivation and accomplishment of natural growth. Are Lareau’s two styles spread across these four new categories or was Lareau missing something big?

3. There are some interesting implications here for the culture wars. The suggestion in this article is that both The Faithful and The Engaged Progressives follow similar patterns even if they hold to different ideologies and tend to fight among themselves. Is this because of social class? Education? Race? Current or lingering effects of religion? Living in suburbs and/or wealthier areas?

4. When I see typologies like this, I always wonder about how many categories can and should be created. Is four cultural family types enough or too many? A lower number seems better for having more coherent categories and it is easier to discuss the findings. However, if there are actually smaller clusters of families, then more types may be needed to be more precise and better describe reality.

Mapping secessionist petitions by county as well as looking at gender

A sociologist and a graduate seminar took data from petitions for secession from the United States as listed on whitehouse.gov and mapped the patterns. Here is the map and some of the results:

While petitions are focused on particular states, signers can be from anywhere. In order to show where support for these secession was the strongest, a graduate seminar on collecting and analyzing and data from the web in the UNC Sociology Department downloaded the names and cities of each of the petition signers from the White House website, geocoded each of the locations, and plotted the results.

In total, we collected data on 862,914 signatures. Of these, we identified 304,787 unique combinations of names, places and dates, suggesting that a large number of people were signing more than one petition. Approximately 90%, or 275,731, of these individuals provided valid city locations that we could locate with a US county.

The above graphic shows the distribution of these petition signers across the US. Colors are based proportion of people in each county who signed, and the total number of signers is displayed when you click or hover over a county.

We also looked at the distribution of petition signers by gender. While petition signers did not list their gender, we attempted to match first names with Social Security data on the relative frequency of names by sex. Of the 302,502 respondents with gendered names, 63% had male names and 38% had female names. This 26 point gender gap is twice the size of the gender gap for voters in the 2012 Presidential election. For signatures in the last 24 hours, the gender gap has risen to 34 points.

So it looks like the petition signers are more likely to be men from red states and more rural counties. On one hand, this is not too surprising. On the other hand, it is an interesting example of combining publicly available data and looking for patterns.

Sociologist defends statistical predictions for elections and other important information

Political polling has come under a lot of recent fire but a sociologist defends these predictions and reminds us that we rely on many such predictions:

We rely on statistical models for many decisions every single day, including, crucially: weather, medicine, and pretty much any complex system in which there’s an element of uncertainty to the outcome. In fact, these are the same methods by which scientists could tell Hurricane Sandy was about to hit the United States many days in advance…

This isn’t wizardry, this is the sound science of complex systems. Uncertainty is an integral part of it. But that uncertainty shouldn’t suggest that we don’t know anything, that we’re completely in the dark, that everything’s a toss-up.

Polls tell you the likely outcome with some uncertainty and some sources of (both known and unknown) error. Statistical models take a bunch of factors and run lots of simulations of elections by varying those outcomes according to what we know (such as other polls, structural factors like the economy, what we know about turnout, demographics, etc.) and what we can reasonably infer about the range of uncertainty (given historical precedents and our logical models). These models then produce probability distributions…

Refusing to run statistical models simply because they produce probability distributions rather than absolute certainty is irresponsible. For many important issues (climate change!), statistical models are all we have and all we can have. We still need to take them seriously and act on them (well, if you care about life on Earth as we know it, blah, blah, blah).

A key point here: statistical models have uncertainty (we are making inferences about larger populations or systems from samples that we can collect) but that doesn’t necessarily mean they are flawed.

A second key point: because of what I stated above, we should expect that some statistical predictions will be wrong. But this is how science works: you tweak models, take in more information, perhaps change your data collection, perhaps use different methods of analysis, and hope to get better. While it may not be exciting, confirming what we don’t know does help us get to an outcome.

I’ve become more convinced in recent years that one of the reasons polls are not used effectively in reporting is that many in the media don’t know exactly how they work. Journalists need to be trained in how to read, interpret, and report on data. This could also be a time issue; how much time to those in the media have to pore over the details of research findings or do they simply have to scan for new findings? Scientists can pump out study after study but part of the dissemination of this information to the public requires a media who understands how scientific research and the scientific process work. This includes understanding how models are consistently refined, collecting the right data to answer the questions we want to answer, and looking at the accumulated scientific research rather than just grabbing the latest attention-getting finding.

An alternative to this idea about media statistical illiteracy is presented in the article: perhaps the media perhaps knows how polls work but likes a political horse race. This may also be true but there is a lot of reporting on statistics on data outside of political elections that also needs work.

Another call for the need for theory when working with big data

Big data is not just about allowing researchers to look at really large samples or lots of information at once. It also requires the use of theory and asking new kinds of questions:

Like many other researchers, sociologist and Microsoft researcher Duncan Watts performs experiments using Mechanical Turk, an online marketplace that allows users to pay others to complete tasks. Used largely to fill in gaps in applications where human intelligence is required, social scientists are increasingly turning to the platform to test their hypotheses…

This is a point political forecaster and author Nate Silver discusses in his recent book The Signal and the Noise. After discussing economic forecasters who simply gather as much data as possible and then make inferences without respect for theory, he writes:

This kind of statement is becoming more common in the age of Big Data. Who needs theory when you have so much information? But this is categorically the wrong attitude to take toward forecasting, especially in a field like economics, where the data is so noisy. Statistical inferences are much stronger when backed up by theory or at least some deeper thinking about their root causes…

The value of big data isn’t simply in the answers it provides, but rather in the questions it suggests that we ask.

This follows a similar recent argument made on the Harvard Business Review website.

I like the emphasis here on the new kinds of questions that might be possible with big data. There are a couple of ways these could happen:

1. Uniquely large datasets might allow for different comparisons, particularly among smaller groups, that are more difficult to look at even with nationally representative samples.

2. The speed at which the experiments can be conducted through means like Amazon’s Mechanical Turk means more can be done more quickly. Additionally, I wonder if this could help alleviate some of the replication issues that pop up with scientific research.

3. Instead of having to be constrained by data limitations, big data might give researchers creative space to think on a larger scale and more outside of the box.

Of course, lots of topics are not well-suited for looking at through big data but such information does offer unique opportunities for researchers and theories.