See the bigger picture when reading media reports of new scientific findings

Think a new study touted by the media is too sensational? Take the long view of such reports, as sociologists and other researchers do:

One solution for reporters that hasn’t gotten a lot of attention yet, but should, is the value of talking to social scientists — historians of science and medicine, anthropologists, political scientists and sociologists of science– in the process of reporting about research. Experts in these disciplines who examine the practice of scientific and medical research from outside of it are in a great position to give reporters, and by extension their readers, insight into where new scientific knowledge came from, what sort of agenda might be motivating the people involved, the cultural meanings attached to particular scientific findings, what questions were being asked—and what questions weren’t asked, but should have been.

To see how, take a look at a story from earlier this week that nicely illustrates the value a social scientist can bring to how a science story is reported: Did you hear that hurricanes with feminine names are deadlier than ones with male names because people’s sexist bias causes them not to take female storms as seriously? As Ed Yong reported in National Geographic’s “Phenomena”, it’s probably not true. Yong talked to a social scientist who helped break down the reasons why – from weaknesses of the methods to the context of other factors already known to affect the deadliness of storms. Check out the reporting and ensuing discussion here…

“Almost every time one of these studies comes out, it’s promoted as evidence that ‘X single factor’ is a decisive culprit,” said Chloe Silverman, PhD, a sociologist and historian of science in Drexel’s Center for Science, Technology & Society, whose current project is focused on people’s approaches to understanding pollinator health. “But there’s plenty of evidence that a combination of factors contribute to honey bee health problems.”…

And journalists tend to follow particular narrative conventions, such as “the discovery just around the corner” or “the intractable mystery,” Silverman noted. “But social scientists who study science are in a better position than most to both identify those tendencies and offer more realistic descriptions of the pace and progress of scientific research.”

There is probably some irony here that Drexel’s media relations is pushing this point of view even as it is a helpful correction to the typical approach journalists take to the latest scientific findings. To be honest, it takes time in sociology and other fields to develop credible hypotheses, data, and theories. Researchers interact with other research to further their ideas and build upon the work that has already been done. Reaching consensus may take years or it may never completely happen.

I wonder how much social and natural scientists could do to better communicate the full scientific process. In a world that seems to be going faster, science still takes time.

Getting around the anger or apathy students have for taking a sociology qualitative research methods class

In a review that describes how a book’s author practices “stealth sociology,” one sociologist describes how he tries to get his students excited about a qualitative research methods class:

Every semester, I teach a course in qualitative research methods. Revealing this at a dinner party or art opening invariably prompts sympathy, no response at all or variations on “Yuck! That was the worst course I ever had.”

Teaching what students dread and remember in anger robs my equilibrium. I tell students qualitative methods happen to be about stories, not numbers and measurements. And who doesn’t love a story and need one—many—daily? I merely teach ways to collect people’s stories, how to observe everyday life and narrate the encounter, and ways to discover stories “contained” in every human communication medium, from movies and tweets to objects of material culture, cars to casseroles.

Hearing this, students perk up. Momentarily. I continue in the liberal arts college spirit and urge students, “Bring to our class discussion and your research planning the skills you developed in English, literature and art classes.”

Hearing this, spirits deflate. Although some take to the freedom in narrative research methods, many students can’t give up the security they find in objective hypotheses, measured variables and reassuring numbers.

“How can we be objective about ourselves?” I argue. “How can anyone?”

Today in the wake of so-called identity studies, we sociologists and anthropologists expect each other to write ourselves into our research. We reveal our social addresses, identify our perspectives, and justify our intent. Sociologists and women’s studies scholars call it standpoint theory. No more pretense of the all-seeing-eye. No more fly on the wall invisibility.

As I think back on my experiences teaching lots of Intro to Sociology, Statistics, and Research Methods (involving both quantitative and qualitative methods), I have found the opposite to often be true: undergraduates more often understand the value or stories and narratives and have more difficulty thinking about scientifically studying people and society. Perhaps this is the result of a particular subculture that values personal relationships.

At the same time, sociologists collect stories in particular ways. It isn’t just about one person making an interpretation and other people can see very different things in the stories. This involves rigorous data collection and analysis by looking across cases. But, this is done without statistical tests and often having smaller samples (which can limit generalizability). Coding “texts” can be a time-consuming and involved process and interviews with people take quite a bit of work in crafting good questions, interacting with respondents in order to build rapport but not doing things to influence their answers, and then understanding and applying what you have heard. We know that we might bias the process, even in the selection of a research question, but we can find ways to limit this including utilizing multiple coders as well as sharing our work with others so they can check our findings and help us think through the implications.

Analyze big data better when computer scientists and social scientists share knowledge

Part of the “big-data struggle” is to have more computer scientists interacting with social scientists:

The emerging problems highlight another challenge: bridging the “Grand Canyon,” as Mr. Lazer calls it, between “social scientists who aren’t computationally talented and computer scientists who aren’t social-scientifically talented.” As universities are set up now, he says, “it would be very weird” for a computer scientist to teach courses to social-science doctoral students, or for a social scientist to teach research methods to information-science students. Both, he says, should be happening.

Both groups could learn quite a bit from each other. Arguably, programming skills would be very useful in a lot of disciplines in a world gaga over technology, apps, and big data. Arguably, more rigorous methodologies to find and interpret patterns are needed across a wide range of disciplines interested in human behavior and social interaction. Somebody has to be doing this already, perhaps even within individuals who have training to both areas. But, joining the two academic bodies together on a more formal and institutionalized basis could take quite a bit of work.

Sociologists argue it is difficult to find causal data for how inequality leads to different outcomes

Two sociologists tackle the question of how exactly inequality is related to a variety of social outcomes and argue it is difficult to find causal, and not correlative, data:

For all the brain power thrown at the problem since then, however, specific evidence about inequality’s effects has been hard to find. Mr. Jencks said he could already picture the book’s reviews, “Professor Doesn’t Know What He Is Talking About.”…

One problem with these analyses is that they are based on correlations between levels of inequality and variables like life expectancy or the odds of poor children climbing the income ladder. But such correlations can’t prove inequality causes other social ills. They can’t disentangle inequality from the myriad things pushing American society this way and that.

Life expectancy in the United States might lag that of other countries because the United States still does not have universal health care. Scandinavia may enjoy higher upward mobility than the United States because governments in Sweden, Denmark and other Scandinavian countries invest a lot in early childhood education and the United States does not.

Lane Kenworthy, a sociologist at the University of Arizona, is all too aware of these limitations. He was to be Mr. Jencks’s co-author on the book about inequality’s consequences. Now he is going it alone, hoping to publish “Should We Worry About Inequality?” next year.

“People that worry about inequality for normative reasons have been very quick to jump on plausible hypothesis and a little bit of evidence to make sweeping conclusions about its consequences,” Professor Kenworthy told me.

It sounds like these sociologists are asking for some more methodological rigor in studying how inequality affects social life. Finding direct relationships between social forces and outcomes can be difficult but I look forward to seeing more work on the subject.

Read more in this follow-up interview with Lane Kenworthy.

The difficulty of getting good data on heroin use

Heroin use is getting more public attention but how exactly can researchers go about measuring its use?

For as long as it’s been around, the NSDUH has provided a pretty good picture of marijuana use in the U.S., and is a reliable source for annual stories about teens and pot (a perennial sticking point in the debate over marijuana legalization). But the NSDUH data on hard drug use seldom makes as big a splash. In a new report from the RAND Corporation, researchers suggest that one reason for this disparity may be that the NSDUH survey underestimates heroin use by an eye-boggling amount. “Estimates from the 2010 NSDUH suggest there were only about 60,000 daily and near daily heroin users in the United States,” drug policy researchers Beau Kilmer and Jonathan Caulkins, both of the RAND Corporation, wrote in a recent editorial. “The real number is closer to 1 million.”…

Kilmer and Caulkins came up with their much higher figures for heroin and hard-drug use by combining county-level treatment and mortality data with NSDUH data and a lesser known government survey called the Arrestee Drug Abuse Monitoring Program. Instead of calling people at home and asking them about their drug use, the ADAM survey questions arrestees when they’re being booked and tests their urine. “ADAM goes where serious substance abuse is concentrated — among those entangled with the criminal justice system, specifically arrestees in booking facilities,” Kilmer and Caulkins write. The survey also asks questions about street prices, as well as how and where drugs are bought. The data collected by the ADAM Program enabled RAND to put together a report looking at what Americans spent on drugs between 2000 and 2010.

In short, ADAM is a crucial tool for crafting hard-drug policy. Which is why researchers are alarmed that after being scaled back several times (including a brief shutdown between 2004 and 2006), funding for ADAM has completely run out. “Folks in the research world have known that this was coming,” Kilmer writes in an email. “I wanted to use the attention around our new market report to highlight the importance of collecting information about hard drug users in non-treatment settings. ADAM was central to our estimates for cocaine, heroin, and meth.”

Despite providing a wealth of information since the early 2000s, the budget for ADAM has slowly been chipped away. The survey was originally conducted in more than 35 counties, then 10, then five. The program disappeared completely between 2004 and 2006, but was revived by the Office of National Drug Control Policy in 2006. At its most expensive, ADAM cost $10 million a year.

For a variety of reasons, including public health and the resource-intensive war on drugs, this information is important. But, illegal activities are often difficult to measure. This requires researchers to be more creative in finding reliable and valid data. Even then, two other issues emerge:

1. How much do the researchers feel like they are estimating?  What are their margin of errors?

2. This can become a political football: is the data being collected worth the money it costs? For bean counters, is this the most efficient way to collect the data?

I wonder if this could be part of arguments for legalizing certain activities: it would be much easier for researchers (and governments, the public, etc.) to get good data.

A “single lifestyle” the primary factor in making a city one of the worst for singles?

A recent list of the “10 Worst Cities for Singles” uses this criteria:

How did we come up with our list of the worst cities for singles? We started by looking for metropolitan areas with more than 125,000 people. Then we penalized places with small populations of singles, including the never-married, divorced and widowed. The share of unmarried residents in each of these bottom-ten cities is well shy of the national average.

Financial indicators didn’t boost the cities’ attractiveness. Although many of these areas boast below-average living costs, paychecks typically are way below average, too. We also factored in education level, keeping in mind that people with bachelor’s and advanced degrees are more likely to be gainfully employed. After all, you can’t exactly rock the single lifestyle without the earnings to fund it.

So there two primary factors in this analysis:

1. The number of single people. Presumably this has something to do with an exciting social scene, a la the culture and scene sought by the creative class. However, just measuring the number of single people doesn’t necessarily signal a more or less exciting cultural and entertainment scene.

2. The financial indicators are mainly about income, suggesting that single workers don’t want to be in places without high incomes. Does this mean younger workers only want higher-paying jobs? Is a high paying job the number one goal? The last line in the second paragraph above drives this point home: younger workers want a flashier “single lifestyle.”

All this seems to make some assumptions about single workers: they want high incomes, they want other singles around, and they want to “rock the single lifestyle.” While this may be the case for a number of them, it does highlight some different reasons for moving that are fairly accepted in American society today:

1. Economics. People need jobs. They should move where the jobs are. Young workers are particularly assumed to be more mobile and willing to move.

2. Finding exciting cultural centers. Places like Austin are held up as cities where one should move to enjoy life.

Are there other acceptable reasons for choosing where to live?

The difficulty in wording survey questions about American education

Emily Richmond points out some of the difficulties in creating and interpreting surveys regarding public opinion on American education:

As for the PDK/Gallup poll, no one recognizes the importance of a question’s wording better than Bill Bushaw, executive director of PDK. He provided me with an interesting example from the September 2009 issue of Phi Delta Kappan magazine, explaining how the organization tested a question about teacher tenure:

“Americans’ opinions about teacher tenure have much to do with how the question is asked. In the 2009 poll, we asked half of respondents if they approved or disapproved of teacher tenure, equating it to receiving a “lifetime contract.” That group of Americans overwhelmingly disapproved of teacher tenure 73% to 26%. The other half of the sample received a similar question that equated tenure to providing a formal legal review before a teacher could be terminated. In this case, the response was reversed, 66% approving of teacher tenure, 34% disapproving.”

So what’s the message here? It’s one I’ve argued before: That polls, taken in context, can provide valuable information. At the same time, journalists have to be careful when comparing prior years’ results to make sure that methodological changes haven’t influenced the findings; you can see how that played out in last year’s MetLife teacher poll. And it’s a good idea to use caution when comparing findings among different polls, even when the questions, at least on the surface, seem similar.

Surveys don’t write themselves nor is the interpretation of the results necessarily straightforward. Change the wording or the order of the questions and results can change. I like the link to the list of “20 Questions A Journalist Should Ask About Poll Results” put out by the National Council on Public Polls. Our public life would be improved if journalists, pundits, and the average citizen would pay attention to these questions.

Differences in selfies across global cities

A new online project finds that selfies taken in different global cities like Moscow, New York, and Sao Paulo exhibit some differences:

That seems the most salient takeaway from “Selfie City,” an ambitious selfie-mapping project released Wednesday by a group of independent and university-affiliated researchers. The project sought to extract data from 3,200 selfies taken in Bangkok, Berlin, Moscow, New York and Sao Paolo, then map that data along demographic and geographic lines. Do people in New York smile more than people in Berlin? (Yes.) Does the face angle or camera tilt say something about culture? (Possibly.)…

Many of the researchers’ findings are less than conclusive — there’s either not enough data, or advanced enough analysis, to really make sweeping statements without a bit of salt. The photos — 20,000 for each city — were scraped during a one-week period in December and analyzed/culled to 600 by computer software and Mechanical Turk. While 600 photos may seem like a lot, there’s no indication whether that number is a statistically significant one, nor whether the culled photos represent each country’s Instagram demographics…

Selfie City has found more evidence for a phenomenon both sociologists and casual users have noted already: women take far more self-portraits than men. (Up to 4.6 times as many, at least in Moscow.)…

They also suggest that people take more expressive selfies and strike different poses between cities. Bangkok and Sao Paulo, for instance, are by far the smiliest — Moscow and Berlin, not so much.

Sounds like a clever use of available images and analysis options to start exploring differences across cities. While not all residents of these big cities will follow such patterns, cities are often known for particular social features. New Yorkers may be relatively gruff. Other cities are known as being open and friendly – think of the popular images of big Brazilian cities. (I wonder how much this will come up with future World Cup and Olympics coverage.)

At the same time, how many selfies would a researcher have to look at to get a representative sample? Over what time period? And, perhaps the underlying issue that can’t really be solved – this is likely a very select population that regularly takes and posts selfies (even beyond whether this represents the typical Instagram/social media user).

Confronting the problems with p-values

Nature provides an overview of concerns about how much scientists rely on p-values “which is neither as reliable nor as objective as most scientists assume”:

One result is an abundance of confusion about what the P value means4. Consider Motyl’s study about political extremists. Most scientists would look at his original P value of 0.01 and say that there was just a 1% chance of his result being a false alarm. But they would be wrong. The P value cannot say this: all it can do is summarize the data assuming a specific null hypothesis. It cannot work backwards and make statements about the underlying reality. That requires another piece of information: the odds that a real effect was there in the first place. To ignore this would be like waking up with a headache and concluding that you have a rare brain tumour — possible, but so unlikely that it requires a lot more evidence to supersede an everyday explanation such as an allergic reaction. The more implausible the hypothesis — telepathy, aliens, homeopathy — the greater the chance that an exciting finding is a false alarm, no matter what the P value is…

These are sticky concepts, but some statisticians have tried to provide general rule-of-thumb conversions (see ‘Probable cause’). According to one widely used calculation5, a P value of 0.01 corresponds to a false-alarm probability of at least 11%, depending on the underlying probability that there is a true effect; a P value of 0.05 raises that chance to at least 29%. So Motyl’s finding had a greater than one in ten chance of being a false alarm. Likewise, the probability of replicating his original result was not 99%, as most would assume, but something closer to 73% — or only 50%, if he wanted another ‘very significant’ result6, 7. In other words, his inability to replicate the result was about as surprising as if he had called heads on a coin toss and it had come up tails…

Critics also bemoan the way that P values can encourage muddled thinking. A prime example is their tendency to deflect attention from the actual size of an effect. Last year, for example, a study of more than 19,000 people showed8 that those who meet their spouses online are less likely to divorce (p < 0.002) and more likely to have high marital satisfaction (p < 0.001) than those who meet offline (see Nature http://doi.org/rcg; 2013). That might have sounded impressive, but the effects were actually tiny: meeting online nudged the divorce rate from 7.67% down to 5.96%, and barely budged happiness from 5.48 to 5.64 on a 7-point scale. To pounce on tiny P values and ignore the larger question is to fall prey to the “seductive certainty of significance”, says Geoff Cumming, an emeritus psychologist at La Trobe University in Melbourne, Australia. But significance is no indicator of practical relevance, he says: “We should be asking, ‘How much of an effect is there?’, not ‘Is there an effect?’”

Perhaps the worst fallacy is the kind of self-deception for which psychologist Uri Simonsohn of the University of Pennsylvania and his colleagues have popularized the term P-hacking; it is also known as data-dredging, snooping, fishing, significance-chasing and double-dipping. “P-hacking,” says Simonsohn, “is trying multiple things until you get the desired result” — even unconsciously. It may be the first statistical term to rate a definition in the online Urban Dictionary, where the usage examples are telling: “That finding seems to have been obtained through p-hacking, the authors dropped one of the conditions so that the overall p-value would be less than .05”, and “She is a p-hacker, she always monitors data while it is being collected.”

As the article then goes on to note, alternatives haven’t quite caught on. It seems the most basic defense is one that statisticians should adopt anyhow: always recognizing the chance that their statistics could be wrong. It also highlights the need for replicating studies with different datasets to confirm results.

At a relatively basic level, if p-levels are so problematic, how does this change the basic statistics courses so many undergraduates take?

21st century methodology problem: 4 ways to measure online readership

While websites can collect lots of information about readers, how exactly this should all be measured is still unclear. Here are four options:

Uniques: Unique visitors is a good metric, because it measures monthly readers, not just meaningless clicks. It’s bad because it measures people rather than meaningful engagement. For example, Facebook viral hits now account for a large share of traffic at many sites. There are one-and-done nibblers on the Web and there are loyal readers. Monthly unique visitors can’t tell you the difference.

Page Views: They’re good because they measure clicks, which is an indication of engagement that unique visitors doesn’t capture (e.g.: a blog with loyal readers will have a higher ratio of page views-to-visitors, since the same people keep coming back). They’re bad for the same reason that they can be corrupted. A 25-page slideshow of the best cities for college graduates will have up to 25X more views than a one-page article with all the same information. The PV metric says the slideshow is 25X more valuable if ads are reloaded on each page of the slideshow. But that’s ludicrous.

Time Spent/Attention Minutes: Page views and uniques tell you an important but incomplete fact: The article page loaded. It doesn’t tell you what happens after the page loads. Did the reader click away? Did he stay for 20 minutes? Did he open the browser tab and never read the story? These would be nice things to know. And measures like attention minutes can begin to tell us. But, as Salmon points out, they still don’t paint a complete picture. Watching a 5 minute video and deciding it was stupid seems less valuable than watching a one minute video that you share with friends and praise. Page views matter, and time spent matters, but reaction matters, too. This suggests two more metrics …

Shares and Mentions: “Shares” (on Facebook, Twitter, LinkedIn, or Google+) ostensibly tell you something that neither PVs, nor uniques, nor attention minutes can tell you: They tell you that visitors aren’t just visiting. They’re taking action. But what sort of action? A bad column will get passed around on Twitter for a round of mockery. An embarrassing article can go viral on Facebook. Shares and mentions can communicate the magnitude of an article’s attention, but they can’t always tell you the direction of the share vector: Did people share it because they loved it, or because they loved hating it?

Here are some potential options for sorting this all out:

1. Developing a scale or index that combines all of these factors. It could be as easy as each of these four counts for 25% or the components could be weighted differently.

2. Heavyweights in the industry – whether particular companies or advertisers or analytical leaders – make a decision about which of these is most important. For example, comments after this story note the problems with Nielsen television ratings over the decades but Nielsen had a stranglehold on this area.

3. Researchers outside the industry could “objectively” develop a measure. This may be unlikely as outside actors have less financial incentive but perhaps someone sees an opportunity here.

In the meantime, there is plenty of information on online readership to look at, websites and companies can claim various things with different metrics, and websites and advertisers will continue to have a strong financial interest in all of this.