A health example of choosing between a dichotomous outcome or a continuum

When I teach Statistics and Research Methods, we talk a little about how researchers make decisions about creating and using categories for data they have. As this example of recommendations about fertility notes, creating categories can be a tricky process:

Photo by Burak K on Pexels.com

Being 35 or older is labeled by the medical community as “advanced maternal age.” In diagnosis code speak, these patients are “elderly,” or in some parts of the world, “geriatric.” In addition to being offensive to most, these terms—so jarringly at odds with what is otherwise considered a young age—instill a sense that one’s reproductive identity is predominantly negative as soon as one reaches age 35. But the number 35 itself, not to mention the conclusions we draw from it, has spun out of our collective control…

The 35-year-old threshold is not only known by patients, it is embraced by doctors as a tool that guides the care of their patients. It’s used bimodally: If you’re under 35, you’re fine; if you’re 35 or older, you have a new host of problems. This interpretation treats the issue at hand as what is known as a “threshold effect.” Cross the threshold of age 35, it implies, and the intrinsic nature of a woman’s body has changed; she falls off a cliff from one category into another. (Indeed, many of my patients speak of crossing age 35 as exactly this kind of fall, with their fertility “plummeting” suddenly.) As I’ve already stated, though, the age-related concerns are gradual and exist along a continuum. Even if the rate of those risks accelerates at a certain point, it’s still not a quantum leap from one risk category to another.

This issue comes up frequently in science and medicine. In order to categorize things that fall along a continuum, things that nature itself doesn’t necessarily distinguish as being separable into discrete groups, we have to create cutoffs. Those work very well when comparing large groups of patients, because that’s what the studies were designed to do, but to apply those to individual patients is more difficult. To a degree, they can be useful. For example, when we are operating far from those cutoffs—counseling a 25-year-old versus a 45-year-old—the conclusions to draw from that cutoff are more applicable. But operate close to it—counseling a 34-year-old trying to imagine her future 36-year-old self—and the distinction is so subtle as to be almost superfluous.

The trade-offs seem clear. A single point where the data turns from one category to another, an age of 35, simplifies the research findings (though the article suggests they may not actually point to 35) and allows doctors and others to offer clear guidance. The number is easy to remember.

A continuum, on the other hand, might better fit the data where there is not a clear drop-off at an age near 35. The range offers more flexibility for doctors and patients to develop an individualized approach.

Deciding which is better requires thinking about the advantages of each, the purpose of the categories, and who wants what information. The “easy” answer is that both sets of categories can exist; people could keep in mind a rough estimate of 35 while doctors and researchers could have conversations where they discuss why that particular age may or may not matter for a person.

More broadly, learning more about continuums and considering when they are worth deploying could benefit our society. I realize I am comfortable with them; sociologists suggest many social phenomena fall along a continuum with many cases falling in between. But, this tendency toward continuums or spectrums or more nuanced or complex results may not always be helpful. We can decry black and white thinking and yet we all need to regularly make quick decisions based on a limited number of categories (I am thinking of System 1 thinking described by behavioral economists and others). Even as we strive to collect good data, we also need to pay attention to how we organize and communicate that data.

The retraction of a study provides a reminder of the importance of levels of measurement

Early in Statistics courses, students learn about different ways that variables can be measured. This is often broken down into three categories: nominal variables (unordered, unranked), ordinal variables (ranked but with varied category widths), and interval-ratio (ranked and with consistent spaces between categories). Decisions about how to measure variables can have significant influence on what can be done with the data later. For example, here is a study that received a lot of attention when published but the researchers miscoded a nominal variable:

In 2015, a paper by Jean Decety and co-authors reported that children who were brought up religiously were less generous. The paper received a great deal of attention, and was covered by over 80 media outlets including The Economist, the Boston Globe, the Los Angeles Times, and Scientific American. As it turned out, however, the paper by Decety was wrong. Another scholar, Azim Shariff, a leading expert on religion and pro-social behavior, was surprised by the results, as his own research and meta-analysis (combining evidence across studies from many authors) indicated that religious participation, in most settings, increased generosity. Shariff requested the data to try to understand more clearly what might explain the discrepancy.

To Decety’s credit, he released the data. And upon re-analysis, Shariff discovered that the results were due to a coding error. The data had been collected across numerous countries, e.g. United States, Canada, Turkey, etc. and the country information had been coded as “1, 2, 3…” Although Decety’s paper had reported that they had controlled for country, they had accidentally not controlled for each country, but just treated it as a single continuous variable so that, for example “Canada” (coded as 2) was twice the “United States” (coded as 1). Regardless of what one might think about the relative merits and rankings of countries, this is obviously not the right way to analyze data. When it was correctly analyzed, using separate indicators for each country, Decety’s “findings” disappeared. Shariff’s re-analysis and correction was published in the same journal, Current Biology, in 2016. The media, however, did not follow along. While it covered extensively the initial incorrect results, only four media outlets picked up the correction.

In fact, Decety’s paper has continued to be cited in media articles on religion. Just last month two such articles appeared (one on Buzzworthy and one on TruthTheory) citing Decety’s paper that religious children were less generous. The paper’s influence seems to continue even after it has been shown to be wrong.

Last month, however, the journal, Current Biology, at last formally retracted the paper. If one looks for the paper on the journal’s website, it gives notice of the retraction by the authors. Correction mechanisms in science can sometimes work slowly, but they did, in the end, seem to be effective here. More work still needs to be done as to how this might translate into corrections in media reporting as well: The two articles above were both published after the formal retraction of the paper.

To reiterate, the researcher treated country – a nominal variable in this case since the countries were not ranked or ordered in any particular way – incorrectly which then threw off the overall results. When then using country correctly – from the description above, it sounds like using country as a dummy variable coded 1 and 0 – the findings that received all the attention disappeared.

The other issue at play here is whether corrections to academic studies or retractions are treated as such. It is hard to notify readers that a previously published study had flaws and the results have changed.

All that to say, paying attention to level of measurement earlier in the process helps avoid problems down the road.

Strong spurious correlations enhanced in appearance with mismatched dual axes

I stumbled across a potentially fascinating website titled Spurious Correlations that looks at relationships between odd variables. Here are two examples:

According to the site, both of these pairs have correlations higher than 0.94. In other words, very strong.

One issue: using dual axes can throw things off. The bottom chart above shows a negative relationship – but this is only because the axes are different. The top chart makes it look like the lines really go together – but the axes are way off from each other with the left side ranging from 29-34 and the right side ranging from 300-900. Overall, the charts reinforce the strong correlations between the two variables but using dual axes can be misleading.

An emerging portrait of emerging adults in the news, part 1

In recent weeks, a number of studies have been reported on that discuss the beliefs and behaviors of the younger generation, those who are now between high school and age 30 (an age group that could also be labeled “emerging adults”). In a three-part series, I want to highlight three of these studies because they not only suggest what this group is doing but also hints at the consequences.

Almost a week ago, a story ran along the wires about a new study linking “hyper-texting” and excessive usage of social networking sites with risky behaviors:

Teens who text 120 times a day or more — and there seems to be a lot of them — are more likely to have had sex or used alcohol and drugs than kids who don’t send as many messages, according to provocative new research.

The study’s authors aren’t suggesting that “hyper-texting” leads to sex, drinking or drugs, but say it’s startling to see an apparent link between excessive messaging and that kind of risky behavior.

The study concludes that a significant number of teens are very susceptible to peer pressure and also have permissive or absent parents, said Dr. Scott Frank, the study’s lead author

The study was done at 20 public high schools in the Cleveland area last year, and is based on confidential paper surveys of more than 4,200 students.

It found that about one in five students were hyper-texters and about one in nine are hyper-networkers — those who spend three or more hours a day on Facebook and other social networking websites.

About one in 25 fall into both categories.

Hyper-texting and hyper-networking were more common among girls, minorities, kids whose parents have less education and students from a single-mother household, the study found.

Several interesting things to note in this study:

1. It did not look at what exactly is being said/communicated in these texts or in social networking use. This study examines the volume of use – and there are plenty of high school students who are heavily involved with these technologies.

2. One of the best parts of this story is that the second paragraph is careful to suggest that finding an association between these behaviors does not mean that they cause each other. In other words, there is not a direct link between excessive testing and drug use. Based on this dataset, these variables are related. (This is a great example of “correlation without causation.”)

3. What this study calls for is regression analysis where we can control for other possible factors. It would then give us the ability to compare two students with the same family background and same educational performance and isolate whether texting was really the factor that led to the risky behaviors. If I had to guess, factors like family life and performance in school are more important in predicting these risky behaviors. Then, excessive texting for SNS use is an intervening variable. Why this study did not do this sort of analysis is unclear – perhaps they already have a paper in the works.

Overall, we need more research on these associated variables. While it is interesting in itself that there are large numbers of emerging adults who text a lot and use SNS a lot, we ultimately want to know the consequences. Part two and three of this series will look at a few studies that offer some possible consequences.