Models are models, not perfect predictions

One academic summarizes how we should read and interpret COVID-19 models:

Every time the White House releases a COVID-19 model, we will be tempted to drown ourselves in endless discussions about the error bars, the clarity around the parameters, the wide range of outcomes, and the applicability of the underlying data. And the media might be tempted to cover those discussions, as this fits their horse-race, he-said-she-said scripts. Let’s not. We should instead look at the calamitous branches of our decision tree and chop them all off, and then chop them off again.

Sometimes, when we succeed in chopping off the end of the pessimistic tail, it looks like we overreacted. A near miss can make a model look false. But that’s not always what happened. It just means we won. And that’s why we model.

Five quick thoughts in response:

  1. I would be tempted to say that the perilous times of COVID-19 lead more people to see models as certainty but I have seen this issue plenty of times in more “normal” periods.
  2. It would help if the media had less innumeracy and more knowledge of how science, natural and social, works. I know the media leans towards answers and sure headlines but science is often messier and takes time to reach consensus.
  3. Making models that include social behavior is difficult. This particular phenomena has both a physical and social component. Viruses act in certain ways. Humans act in somewhat predictable ways. Both can change.
  4. Models involve data and assumptions. Sometimes, the model might fit reality. At other times, models do not fit. Either way, researchers are looking to refine their models so that we better understand how the world works. In this case, perhaps models can become better on the fly as more data comes in and/or certain patterns are established.
  5. Predictions or proof can be difficult to come by with models. The language of “proof” is one we often use in regular conversation but is unrealistic in numerous academic settings. Instead, we might talk about higher or lower likelihoods or provide the best possible estimate and the margins of error.

A (real) pie chart to effectively illustrate wealth inequality

Pie graphs can be great at showing relative differences between a small number of categories. A recent example of this comes from CBS:

CBS This Morning co-host Tony Dokoupil set up a table at a mall in West Nyack, New York, with a pie that represented $98 trillion of household wealth in the United States. The pie was sliced into 10 pieces and Dokoupil asked people to divide up those pieces onto five plates representing the poorest, the lower middle class, middle class, upper middle class, and wealthiest Americans. No one got it right. And, in fact, no one was even kind of close to estimating the real ratio, which involves giving nine pieces to the top 20 percent of Americans while the upper middle class and the middle class share one piece between the two of them. The lower middle class would effectively get crumbs considering they only have 0.3 percent of the pie. What about the poorest Americans? They wouldn’t get any pie at all, and in fact would get a bill, considering they are, on average, around $6,000 in debt…

To illustrate just how concentrated wealth is in the country, Dokoupil went on to note that if just the top 1 percent are taken into account, they would get four of the nine pieces of pie that go to the wealthiest Americans.

A pie chart sounds like a great device for this situation because of several features of the data and the presentation:

1. There are five categories of social class. Not too many for a pie chart.

2. One of those categories, the top 20 of Americans, clearly has a bigger portion of the pie than the other groups. A pie chart is well-suited to show one dominant category compared to the others.

3. Visitors to a shopping mall can easily understand a pie chart. They understand how it works and what it says (particularly with #1 and #2 above).

Together, a pie chart works in ways that other graphs and charts would not.

(Side note: it is hard to know whether the use of food in the pie chart helped or hurt the presentation. Do people work better with data when feeling hungry?)

Font sizes, randomly ordered names, and an uncertain Iowa poll

Ahead of the Iowa caucuses yesterday, the Des Moines Register had to cancel a final poll just ahead of the voting due to problems with administering the survey:

Sources told several news outlets that they figured out the whole problem was due to an issue with font size. Specifically, one operator working at the call center used for the poll enlarged the font size on their computer screen of the script that included candidates’ names and it appears Buttigieg’s name was cut out from the list of options. After every call the list of candidates’ names is reordered randomly so it isn’t clear whether other candidates may have been affected as well but the organizers were not able to figure out whether it was an isolated incident. “We are unable to know how many times this might have happened, because we don’t know how long that monitor was in that setting,” a source told Politico. “Because we do not know for certain—and may not ever be able to know for certain—we don’t have confidence to release the poll.”…

In their official statements announcing the decision to nix the poll, the organizers did not mention the font issue, focusing instead on the need to maintain the integrity of the survey. “Today, a respondent raised an issue with the way the survey was administered, which could have compromised the results of the poll. It appears a candidate’s name was omitted in at least one interview in which the respondent was asked to name their preferred candidate,” Register executive editor Carol Hunter said in a statement. “While this appears to be isolated to one surveyor, we cannot confirm that with certainty. Therefore, the partners made the difficult decision to not to move forward with releasing the Iowa Poll.” CNN also issued a statement saying that the decision was made as part of their “aim to uphold the highest standards of survey research.”

This provides some insight into how these polls are conducted. The process can include call centers, randomly ordered names, and a system in place so that the administrators of the poll can feel confident in the results (even as there is always a margin of error). If there is a problem in the system, the opinions of those polled may not match what the data says. Will the future processes not allow individual callers to change the font size?

More broadly, a move like this could provide more transparency and ultimately trust regarding political polling. The industry faces a number of challenges. Would revealing this particular issue cause people to wonder how often this happens or reassure them that pollsters are concerned about good data?

At the same time, it appears that the unreported numbers still had an influence:

Indeed, the numbers widely circulating aren’t that different from last month’s edition of the same poll, or some other recent polls. But to other people, both journalists and operatives, milling around the lobby of the Des Moines Marriott Sunday night, the impact had been obvious.

Here are what some reporters told me about how the poll affected their work:

• One reporter for a major newspaper told me they inserted a few paragraphs into a story to anticipate results predicted by the poll.

• A reporter for another major national outlet said they covered an Elizabeth Warren event in part because she looked strong in the secret poll.

• Another outlet had been trying to figure out whether Amy Klobuchar was surging; the poll, which looked similar to other recent polling, steered coverage away from that conclusion.

• “You can’t help it affecting how you’re thinking,” said another reporter.

asdf

“Live from Des Moines and Miami”: twin spectacles of our time

At the gym a few days ago, I saw this headline about the temporary location of a morning news show: “Live from Des Moines and Miami.” The Iowa caucuses on Monday and the Super Bowl today in Miami share some characteristics:

1. Weeks and months of hype. The Super Bowl does not get as much lead up since the participants have only been known for two weeks but both are highly anticipated events. The Iowa caucuses only happen every four years so the combination this year is not normal.

2. The media attention paid to both. Even as they come at different parts of their respective processes – the caucuses come after a lot of campaigning and debates and then kick off primary season while the game concludes a popular NFL year – they are great material for news reports, opinion leaders, and everyone else in the media who might not always care about politics or football.

3. Competition and winners and losers. A football game has a clear winner and loser (though more unusual circumstances might cast a doubt on the victors). The caucuses are not so clear as the outcome requires interpretation but everyone will be looking to name the winners and losers once the voting outcome is known.

4. The entertainment value of it all. The football game is more clearly entertainment – it is just a game after all – but politics is in this camp these days as well. Both events are exciting and at least this year relatively close. With all this tension building, why not locate a morning show to live work from Des Moines and Miami?

In sum, these events seem to go together: the largest American sporting event takes place tonight and the fate of the free world/the most important election of our time/the race to beat the incumbent president really takes off tomorrow. For those who will be watching and broadcasting, may they be entertaining and full of high ratings.

Finding data by finding and/or guessing URLs

A California high school student is posting new data from 2020 presidential polls before news organizations because he found patterns in their URLs:

How does Rawal do it? He correctly figures out the URL — the uniform resource locator, or full web address — that a graphic depicting the poll’s results appears at before their official release.

“URL manipulation is what I do,” he said, “and I’ve been able to get really good at it because, with websites like CNN and Fox, all the file names follow a pattern.”

He added, “I’m not going to go into more detail on that.”

He said he had just spoken with The Register’s news director, who expressed interest in his helping the newspaper “keep it under tighter wraps.” He is considering it.

This makes sense on both ends: media organizations need a way to organize their files and sites and someone who looks at the URLs over time could figure out the pattern. Now to see how media organizations respond as to not let their stories out before they report them.

I imagine there is a broader application for this. Do many organizations have websites or data available that is not linked to or a link is not easily found? I could imagine how such hidden/unlinked data could be used for nefarious or less ethical purposes (imagine scooping news releases about soon-to-be released economic figures in order to buy or sell stocks) as well as data collection.

Win the suburbs, win 2020; patterns in news stories that make this argument

More than a year away from the 2020 presidential election, one narrative is firmly established: the path to victory runs through suburban voters. One such story:

Westerville is perhaps best known locally as the place the former Ohio state governor and Republican presidential candidate John Kasich calls home. But it – and suburbs like it – is also, Democrats say, “ground zero” in the battle for the White House in 2020…

In 2018, Democrats won the House majority in a “suburban revolt” led by women and powered by a disgust of Donald Trump’s race-based attacks, hardline policy agenda and chaotic leadership style. From the heartland of Ronald Reagan conservatism in Orange county, California, to a coastal South Carolina district that had not elected a Democrat to the seat in 40 years, Democrats swept once reliably Republican suburban strongholds…

“There is no way Democrats win without doing really well in suburbs,” said Lanae Erickson, a senior vice-president at Third Way, a centrist Democratic thinktank…

“There are short-term political gains for Democrats in winning over suburban voters but that doesn’t necessarily lead to progressive policies,” she said. In her research, Geismer found that many suburban Democrats supported a national liberal agenda while opposing measures that challenged economic inequality in their own neighborhoods.

Four quick thoughts on such news reports:

1. They often emphasize the changing nature of suburbs. This is true: the suburbs are becoming more racially, ethnically, and economically diverse. At the same time, this does not mean this is happening evenly across suburbs.

2. They often use a representative suburb as a case study to try to illustrate broader trends in the suburbs. Here, it is Westerville, Ohio, home to the Tuesday night Democratic debate. Can one suburb illustrate the broader trends in all suburbs? Maybe.

3. They stress that the swing voters are in the suburbs since city residents are more likely to vote for Democrats while rural residents are more likely to vote for Republicans. It will be interesting to see how Democratic candidates continue to tour through urban areas; will they spend more time in denser population areas or branch out to middle suburbs that straddle the line between solid Republican bases further away from the city and solid Democratic bases closer to the city?

4. Even with the claim that the suburbs are key to the next election, this often sheds little light on long-term trends. As an exception, the last paragraph in the quotation above stands out: suburban voters may turn one way nationally but this does not necessarily translate into more local political action or preferences.

Americans consume more media, sit more

A recent study shows Americans are sitting more and connects this to increased media usage:

That’s what Yin Cao and an international group of colleagues wanted to find out in their latest study published in JAMA. While studies on sitting behavior in specific groups of people — such as children or working adults with desk jobs — have recorded how sedentary people are, there is little data on how drastically sitting habits have changed over time. “We don’t know how these patterns have or have not changed in the past 15 years,” says Cao, an assistant professor in public health sciences at the Washington University School of Medicine.

The researchers used data collected from 2001 to 2016 by the National Health and Nutrition Examination Survey (NHANES), which asked a representative sample of Americans ages five and older how many hours they spent watching TV or videos daily in the past month, and how many hours they spent using a computer outside of work or school. The team analyzed responses from nearly 52,000 people and also calculated trends in the total time people spent sitting from 2007 to 2016. Overall, teens and adults in 2016 spent an average of an hour more each day sitting than they did in 2007. And most people devoted that time parked in front of the TV or videos: in 2016, about 62% of children ages five to 11 spent two or more hours watching TV or videos every day, while 59% of teens and 65% of adults did so. Across all age groups, people also spent more time in 2016 using computers when they were not at work or school compared to 2003. This type of screen time increased from 43% to 56% among children, from 53% to 57% among adolescents and from 29% to 50% among adults…

The increase in total sitting time is likely largely driven by the surge in time spent in front of a computer. As eye-opening as the trend data are, they may even underestimate the amount of time Americans spend sedentary, since the questions did not specifically address time spent on smartphones. While some of this time might have been captured by the data on time spent watching TV or videos, most people spend additional time browsing social media and interacting with friends via texts and video chats — much of it while sitting.

Does this mean the Holy Grail of media is screentime that requires standing and/or walking around to avoid sitting too much? Imagine a device that requires some movement to work. This does not have to be a pedal powered gaming console or smartphone but perhaps just a smartphone that needs to move 100 feet every five minutes to continue. (Then imagine the workarounds, such as motorized scooter while watching a screen a la Wall-E.)

Of course, the answer might be to just consume less media content on screens. This might prove difficult. Nielsen reports American adults consume 11 hours of media a day. Even as critics have assailed television, films, and Internet and social media content, Americans still choose (and are pushed as well) to watch more.