Losing something in the research process with such easy access to information

A retired academic laments that the thrill of the research hunt has diminished with easy access to information and data:

It’s a long stretch, but it seems to me that “ease of access” and the quite miraculous enquiry-request-delivery systems now available to the scholar have had an effect on research. The turn to theory – attention to textuality rather than physical things such as books, manuscripts, letters and paraphernalia of various kinds – has, I think, coincided with big changes in method. Discovery has been replaced by critical discourse and by dialectic.

Fieldwork was, typically, solitary. Lonely sometimes. The new styles at the professional end of the subject are collective – if sometimes less than collegial. The conference is now central to the profession, particularly the conference at which everyone is a speaker, a colloquiast and a verbal “participant”.

One can see something similar at the undergraduate level. I suspect that in my subject (English), some undergraduates are nowadays doing their three years without feeling ever obliged to go the library. Gutenberg, iBook, Wikipedia, SparkNotes, Google and the preowned, dirt-cheap texts on AbeBooks have rendered the library nothing more than emergency back-up and a warm place to work, using wi-fito access extramural materials. The seminar (the undergraduate equivalent of the conference), not the one-on-one tutorial or the know-it-all lecture, is the central feature of the teaching programme.

There may be something to this. Discovering new sources, objects, and data that no one has examined before in out-of-way places is certainly exciting. However, I wonder if the research hunt hasn’t simply shifted. As this academic argues, it is not hard to find information these days. But today, the hunt is more in what story to tell or how to interpret the accessible data. As I tell my students, anyone with some computer skills can do a search, find a dataset, and download it within a few minutes. This does not mean that the everyone can understand how to work with the data and interpret it. (The same would apply to non-numeric/qualitative data that could be quickly found, such as analyzing online interactions or profiles.) Clearing a way through the flood of information is no easy task and can have its own kind of charm.

Perhaps the problem is that students and academics today feel like having the quick access to information already takes care of a large part of their research. Simply go to Google, type in some terms, look at the first few results, and there isn’t much left to do – it is all magic, after all. Perhaps the searching for information that one used to do wasn’t really about getting the information but rather about the amount of time it required as this led to more profitable thinking, reflection, and writing time.

Century 21 survey suggests many Americans would cut back in other areas to buy their “dream home

A new survey from Century 21 looks at what other purchases Americans would be willing to sacrifice in order to afford their “dream home”:

69 percent of homeowners who don’t own what they described as their “dream home” would be willing to make sacrifices to their personal lifestyle to be financially able to purchase it. Non-homeowners are more willing to make sacrifices, and 80 percent indicated they are willing to make changes to their personal lifestyle in order to be financially able purchase their dream home, including:

  • 50 percent: would cut back on dining out,
  • 49 percent: would cut back on their shopping for non-essential items (e.g.,
    clothing, accessories, gadgets, etc.),
  • 47 percent: would give up luxuries (e.g., expensive cable packages, trips to the
    salon, etc.),
  • 39 percent: would cut back on vacations, and
  • 10 percent would contribute less to their 401(k) in order to be able to purchase
    their dream home.

This suggests buying a home is still an important priority for many Americans. At the same time, the questions don’t really get at how much people might be willing to cut back (5% on dining out? 50%), how this compares to other purchases (would people say similar things if they were asked about purchasing a new car or some other big purchase), and how much people would need to cut back if they bought a house (there could be a big difference here if people bought a $220k home versus a $450k home). Also, I’m curious about that 50% that wouldn’t cut back on dining out or the 61% who wouldn’t cut back on vacations; do they not need to or would they seriously not do so in order to buy a dream house?

Another note: this was a web survey.

Harris Interactive® fielded the study on behalf of Mullen Communications from April 24-26, 2012, via its QuickQuerySM online omnibus service, interviewing 2,213 U.S. adults aged 18 years and older, of which 1,416 are homeowners and 734 are renters. This data was weighted to reflect the composition of the general adult population. No estimates of theoretical sampling error can be calculated; a full methodology is available.

Two issues here: this was not a random sample (hence the need for weighting) and if there can’t be any estimates of the sampling error, how trustworthy are the results?

Finding the most extroverted town in America in Iowa

A “marketing research firm” recently named Keota, Iowa as the most extroverted town in America. How exactly does a researcher determine the most extroverted town?

Pyco, which claims to specialize in “psychological profiling,” ranked 61.639 percent of adults in Keota (pop. 1,009, according to the 2010 census)  as extroverts — just beating Manchester, N.Y.’s 60.570 percent for the title of most outgoing. Yet despite this designation, locals are reportedly confused as to how they ranked so high…

In fact, nobody outside Pyco quite understands the methodology for the rankings. According to the Register, the firm collected data in part from other research firms, and processed the numbers with a proprietary 2,000 page algorithm. Keith Streckenbach, the company’s chief operating officer, could not specify which factors most affected whether a person was deemed extroverted.

Keota’s designation has led to a series of stories in Iowa media examining the honor. One piece on the blog Eastern Iowa News Now interviewed Kevin Leicht, the chairman of the University of Iowa’s Sociology Department, and found that extroversion may be a trait inherent to small towns…

Pyco’s algorithm found that only about 57 percent of New York City adults are extroverts.

Several questions follow:

1. I would be really curious to know how this proprietary data was collected. Is it culled from the Internet? Could it be partially determined by the number of local businesses or “third places” (found in the Yellow Pages or some other kind of community listings)?

2. The differences between Keota and New York City are not huge: 61.6% to 57%. If you factor in the margin of error from these estimates (possibly fairly large since how many data points could there be in each town of more than 1,00 people across the US?), these figures may be close to the same. It would be worthwhile to see how broad the range of data for communities really is: are there towns in the US where less than 40% of people are extroverts?

3. Would we expect an extroverted community to know they are more extroverted than another community? Put another way, are extroverts more self-aware of their extroversion or are introverts the ones that are more likely to be aware of these things?

4. Since this data was collected by a marketing firm, I assume they would want to sell this information to companies and other organizations. So if Keota is the most extroverted town, will residents now see different kinds of promotional campaigns in the near future?

Debating the reliability of social science research

A philosopher argues social science research is not that reliable and therefore should have a limited impact on public policy:

Without a strong track record of experiments leading to successful predictions, there is seldom a basis for taking social scientific results as definitive.  Jim Manzi, in his recent book, “Uncontrolled,” offers a careful and informed survey of the problems of research in the social sciences and concludes that “nonexperimental social science is not capable of making useful, reliable and nonobvious predictions for the effects of most proposed policy interventions.”

Even if social science were able to greatly increase their use of randomized controlled experiments, Manzi’s judgment is that “it will not be able to adjudicate most policy debates.” Because of the many interrelated causes at work in social systems, many questions are simply “impervious to experimentation.”   But even when we can get reliable experimental results, the causal complexity restricts us to “extremely conditional, statistical statements,” which severely limit the range of cases to which the results apply.

My conclusion is not that our policy discussions should simply ignore social scientific research.  We should, as Manzi himself proposes, find ways of injecting more experimental data into government decisions.  But above all, we need to develop a much better sense of the severely limited reliability of social scientific results.   Media reports of research should pay far more attention to these limitations, and scientists reporting the results need to emphasize what they don’t show as much as what they do.

Given the limited predictive success and the lack of consensus in social sciences, their conclusions can seldom be primary guides to setting policy.  At best, they can supplement the general knowledge, practical experience, good sense and critical intelligence that we can only hope our political leaders will have.

Several quick thoughts:

1. There seems to be some misunderstanding about the differences between the social and natural sciences. The social sciences don’t have laws in the same sense that the natural sciences do. People don’t operate like planets (to pick up on one of the examples). Social behaviors change over time in response to changing conditions and this makes study more difficult.

2. There is a heavy emphasis in this article on experiments. However, these are more difficult to conduct in the social realm: it is hard to control for all sorts of possible influential factors, have a sizable enough N to make generalizations, and experiments in the “harder sciences” like medicine have some of their own issues (see this critique of medical studies).

3. Saying the social sciences have some or a little predictive ability is different than saying they have none. Having some knowledge of social life is better than none when crafting policy, right?

4. Leaders should have “the general knowledge, practical experience, good sense and critical intelligence” to be able to make good decisions. Are these qualities simply individualistic or could social science help inform and create these abilities?

5. While there are limitations to doing social science research, there are also ways that researchers can increase the reliability and validity of studies. These techniques are not inconsequential; there are big differences between good research methods and bad research methods in what kind of data they produce. There is a need within social science to think about “big science” more often rather than pursuing smaller, limited studies but these studies than can speak to broader questions typically require more data and analysis which in turn requires more resources and time.

Route sociology majors can go: data analyst

I try to remind my students in Statistics and Social Research that there is a need in a lot of industries for people who can collect and analyze data. I was reminded of this when I saw an obituary about a sociologist who had gone on to become a well-known medical data analyst:

A professor in the Department of Health Services at the UCLA Fielding School of Public Health, [E. Richard] Brown founded the UCLA Center for Health Policy Research in 1994.

One of the center’s major activities has been the development of the California Health Interview Survey, the premier source of information about individual and household health status in California. It has served as a model for health surveys for other states.

Brown was the founder and principal investigator for the survey, which produced its first data from interviews with more than 55,000 California households in 2001. Information from the survey, which has been conducted every two years, has been used by policymakers, community advocates, researchers and others.

And working with important data can then lead to public policy options:

“The single thing that makes Rick stand out in this field is that he had an extraordinary capacity to use evidence about the public’s health and strategize and advocate to turn that evidence into the best policy and action,” said Dr. Linda Rosenstock, dean of the UCLA Fielding School of Public Health.

In 1990, Brown was co-author of California’s first single-payer healthcare legislation. He also co-wrote several other healthcare reform bills over the last two decades…

He also was a full-time senior consultant to President Clinton’s Task Force on National Health Care Reform and served as a senior health policy advisor for the Barack Obama for President Campaign — as well as serving as an advisor to U.S. Sens. Bob Kerrey, Paul Wellstone and Al Franken.

We need more people to collect useful data and then interpret what they mean. These days, the problem often is not a lack of information; rather, we need to know how to separate the good data from the bad and then be able to provide a useful interpretation. While some students may prefer to skip over the methodological sections of articles or books, understanding how to collect and analyze data can go a long way. Additionally, learning about these methods and data analysis can help one move toward a sociological view of the social world where personal anecdotes don’t matter as much as broad trends and looking at how social factors (variables) are related to each other.

More Houston residents want to move from suburbs to city than vice versa

Data from the most recent Houston Area Survey suggests that more Houston area residents would prefer to move from the suburbs to the city than vice versa:

Thirteen years ago, the Houston Area Survey started asking people who lived in urban areas if they’d prefer to live in the suburbs.  It also asked people in the suburbs if they’d like to move into the city one day. Survey founder Stephen Klineberg, a Rice University sociology professor, says the survey has revealed a clear shift in opinion.

“In 1999, twice as many people in the city said ‘I want to move to the suburbs,’ than people in the suburbs saying ‘I want to move to the city.’ Those lines have crossed now. And in this year’s survey, significantly more people in the suburbs said ‘I would be interested in, someday, moving to the city,’ than people in the city saying, ‘I want to move to the suburbs.'”

The most obvious reason is the rise in gasoline prices. But Klineberg says shifting demographics are also at play...

And that change in the makeup of households is also reflected in the type of houses people in Houston aspire to own.  The percentage of people who say they’d like a traditional house with a yard in the suburbs has dropped from 59% four years ago, to 47% today. While the proportion who would like a smaller home in a more walkable neighborhood has risen dramatically over the same period of time — from about a third, to more than half.

These findings mirror larger rumblings about where Americans would prefer to live: more people appear to be interested in moving to walkable, denser communities. Are these sentiments primarily coming from those of middle age and above plus young adults?

Two methodological questions:

1. Should we expect that the findings from Houston would be similar to what would be found in other metropolitan regions? Would the sentiments be the same for non-Sunbelt (i.e. Rust Belt) cities?

2. Additionally, how many of those who express an interest in moving from the suburbs to the city will actually follow through on this? Of course, these perceptions matter and could help shape future policy decisions such as building denser developments within the suburbs so that there are pockets of walkability. At the same time, does this indicate long-term behaviorial changes or simply attitudinal shifts at this point of time?

 

We need better data on loneliness and its effects

In response to the recent Atlantic cover story “Is Facebook Making Us Lonely?” by Stephen Marche, sociologist Eric Klinenberg suggests the data is much less clear than the cover story suggests.

This debate suggests two things:

1. We need better data on loneliness and how it affects people. There are multiple ways that this could be done but perhaps we need a methodological breakthrough. I’ve been thinking lately that we need better ways to know what people do when they are alone. Now, we rely on after-the-fact questions rather than observational data. If we ask the same questions over time (such as the famous one about how many confidants respondents have), we can track changes over time but this also requires interpretation. How much loneliness is acceptable and “normal” before there are adverse effects? Does the importance or effects of loneliness change over the lifecourse? Is loneliness mitigated by other social forces?

2. Without this more conclusive data, I think we end up having a proxy battle over two warring American schools of thought: communitarianism versus individualism. This dates back to the early days of the American experiment. Who is more virtuous, the cosmopolitan city dweller or the self-reliant farmer or frontiersman? Should we all live in urban areas or preserve small town life? Should the government help people get an equal shot at success or help defend people from each other? Should religion be expressed in the public sphere or should it be comparmentalized? Several well-known social science works in recent decades have tackled these divides including the 1985 classic Habits of the Heart.  Both Klinenberg and Marche seem to bring these ideological approaches to their arguments and then look for the data that supports their points. For example, Klinenberg admits that loneliness will be felt by those who live alone but this is desirable because living alone allows for other good things to happen.

Increase in retractions of scientific articles tied to problems in scientific process

Several scientists are calling for changes in how scientific work is conducted and published because of a rise in retracted articles:

Dr. Fang became curious how far the rot extended. To find out, he teamed up with a fellow editor at the journal, Dr. Arturo Casadevall of the Albert Einstein College of Medicine in New York. And before long they reached a troubling conclusion: not only that retractions were rising at an alarming rate, but that retractions were just a manifestation of a much more profound problem — “a symptom of a dysfunctional scientific climate,” as Dr. Fang put it.

Dr. Casadevall, now editor in chief of the journal mBio, said he feared that science had turned into a winner-take-all game with perverse incentives that lead scientists to cut corners and, in some cases, commit acts of misconduct…

Last month, in a pair of editorials in Infection and Immunity, the two editors issued a plea for fundamental reforms. They also presented their concerns at the March 27 meeting of the National Academies of Sciences committee on science, technology and the law.

Here is what Fang and Casadevall suggest may help reduce these issues:

To change the system, Dr. Fang and Dr. Casadevall say, start by giving graduate students a better understanding of science’s ground rules — what Dr. Casadevall calls “the science of how you know what you know.”

They would also move away from the winner-take-all system, in which grants are concentrated among a small fraction of scientists. One way to do that may be to put a cap on the grants any one lab can receive.

In other words, give graduate students more training in ethics and the sociology of science while also redistributing scientific research money so that more researchers can be involved. There is a lot to consider here. Of course, there might always be researchers tempted to commit fraud yet these scientists are arguing that the current system and circumstances needs to be tweaked to fight this. Graduate students and young faculty are well aware of what they have to do: publish research in the highest-ranked journals they can. Jobs and livelihoods are on the line. With that pressure, it makes sense that some may resort to unethical measures to get published.

Three other thoughts:

1. How often is social science research retracted? If it is infrequent, should it happen more often?

2. Even if an article or study is retracted, this doesn’t solve the whole issue as that work may have been cited a lot and become well known. Perhaps the bigger problem is “erasing” this study from the collective science memory. This reminds me of newspaper corrections; when you go find the original printing, you don’t know there was a later correction. The same thing can happen here: scientific studies can have long lives.

3. Should disciplines or journals have groups that routinely assess the validity of research studies? This would go beyond peer review and give a group the authority to ask questions about suspicious papers. Alas, this still wouldn’t catch even most of the problematic papers…

Statistics learning opportunity: “Hunger Games Survival Analysis”

Fun with statistics: a survival analysis of The Hunger Games (quick reviews of the books and movie). According to the final analysis, the only significant factor is the rating of each participant:

My interpretation of this is that the Gamemakers know what they’re doing when they assign the ratings. They’ve been doing this for years, so they give scores that are so accurate that they’re actually better predictors of survival time than whether a tribute is a volunteer, a Career, male or female, or forms an alliance. Pretty impressive.

An alternate and more cynical interpretation is that the Gamemakers are concerned about their own reputations and thus engineer the games so as to confirm their ratings, occasionally killing off players who do better or worse than expected based on the ratings, all so that the Gamemakers can look like they knew what they were doing all along. Unfortunately, the political system of Panem ranks so slow on Freedom House’s annual scores that we simply can’t tell what’s going on behind the scenes at all. To cut through their lies we simply need more data.

If you read this, you just also learn something about survival analysis and event history analysis. Bonus: the data and Stata code is also available for download!

Thinking about the event history class I took during grad school, we didn’t look at any data that was remotely close to popular culture.

Also, why not include the data from the second and third books? Granted, the games change a bit in the sequels to ratchet up the tension but that would provide more data to work with…

Five main methods of detecting patterns in data mining

Here is a summary of five of the main methods utilized to uncover patterns when data mining:

Anomaly detection : in a large data set it is possible to get a picture of what the data tends to look like in a typical case. Statistics can be used to determine if something is notably different from this pattern. For instance, the IRS could model typical tax returns and use anomaly detection to identify specific returns that differ from this for review and audit.

Association learning: This is the type of data mining that drives the Amazon recommendation system. For instance, this might reveal that customers who bought a cocktail shaker and a cocktail recipe book also often buy martini glasses. These types of findings are often used for targeting coupons/deals or advertising. Similarly, this form of data mining (albeit a quite complex version) is behind Netflix movie recommendations.

Cluster detection: one type of pattern recognition that is particularly useful is recognizing distinct clusters or sub-categories within the data. Without data mining, an analyst would have to look at the data and decide on a set of categories which they believe captures the relevant distinctions between apparent groups in the data. This would risk missing important categories. With data mining it is possible to let the data itself determine the groups. This is one of the black-box type of algorithms that are hard to understand. But in a simple example – again with purchasing behavior – we can imagine that the purchasing habits of different hobbyists would look quite different from each other: gardeners, fishermen and model airplane enthusiasts would all be quite distinct. Machine learning algorithms can detect all of the different subgroups within a dataset that differ significantly from each other.

Classification: If an existing structure is already known, data mining can be used to classify new cases into these pre-determined categories. Learning from a large set of pre-classified examples, algorithms can detect persistent systemic differences between items in each group and apply these rules to new classification problems. Spam filters are a great example of this – large sets of emails that have been identified as spam have enabled filters to notice differences in word usage between legitimate and spam messages, and classify incoming messages according to these rules with a high degree of accuracy.

Regression: Data mining can be used to construct predictive models based on many variables. Facebook, for example, might be interested in predicting future engagement for a user based on past behavior. Factors like the amount of personal information shared, number of photos tagged, friend requests initiated or accepted, comments, likes etc. could all be included in such a model. Over time, this model could be honed to include or weight things differently as Facebook compares how the predictions differ from observed behavior. Ultimately these findings could be used to guide design in order to encourage more of the behaviors that seem to lead to increased engagement over time.

Several of these seem similar to methods commonly used by sociologists:

1. Anomaly detection seems like looking for outliers. On one hand, outliers can throw off basic measures of central tendency or dispersion. On the other hand, outliers can help prompt researchers to reassess their models and/or theories to account for the unusual cases.

2. Cluster detection and/or classification appear similar to factor analysis. This involves a statistical analysis of a set of variables to see which ones “hang together.” This can be helpful for finding categories and reducing the number of variables in an analysis to a lesser number of important concepts.

3. Regression is used all the time both for modeling and predictions.

This all reminds me of what I heard in graduate school about the difference between data mining and statistical research: data mining amounted to atheoretical analysis. In other words, you might find relationships  between variables (or apparent relationships between variables – could always be a spurious association or there could be suppressor or distorter effects) but you wouldn’t have compelling explanations for these relationships. While you might be able to develop some explanations, this is a different process than hypothesis testing where you set out to look and test for relationships and patterns.