Finding data by finding and/or guessing URLs

A California high school student is posting new data from 2020 presidential polls before news organizations because he found patterns in their URLs:

How does Rawal do it? He correctly figures out the URL — the uniform resource locator, or full web address — that a graphic depicting the poll’s results appears at before their official release.

“URL manipulation is what I do,” he said, “and I’ve been able to get really good at it because, with websites like CNN and Fox, all the file names follow a pattern.”

He added, “I’m not going to go into more detail on that.”

He said he had just spoken with The Register’s news director, who expressed interest in his helping the newspaper “keep it under tighter wraps.” He is considering it.

This makes sense on both ends: media organizations need a way to organize their files and sites and someone who looks at the URLs over time could figure out the pattern. Now to see how media organizations respond as to not let their stories out before they report them.

I imagine there is a broader application for this. Do many organizations have websites or data available that is not linked to or a link is not easily found? I could imagine how such hidden/unlinked data could be used for nefarious or less ethical purposes (imagine scooping news releases about soon-to-be released economic figures in order to buy or sell stocks) as well as data collection.

Win the suburbs, win 2020; patterns in news stories that make this argument

More than a year away from the 2020 presidential election, one narrative is firmly established: the path to victory runs through suburban voters. One such story:

Westerville is perhaps best known locally as the place the former Ohio state governor and Republican presidential candidate John Kasich calls home. But it – and suburbs like it – is also, Democrats say, “ground zero” in the battle for the White House in 2020…

In 2018, Democrats won the House majority in a “suburban revolt” led by women and powered by a disgust of Donald Trump’s race-based attacks, hardline policy agenda and chaotic leadership style. From the heartland of Ronald Reagan conservatism in Orange county, California, to a coastal South Carolina district that had not elected a Democrat to the seat in 40 years, Democrats swept once reliably Republican suburban strongholds…

“There is no way Democrats win without doing really well in suburbs,” said Lanae Erickson, a senior vice-president at Third Way, a centrist Democratic thinktank…

“There are short-term political gains for Democrats in winning over suburban voters but that doesn’t necessarily lead to progressive policies,” she said. In her research, Geismer found that many suburban Democrats supported a national liberal agenda while opposing measures that challenged economic inequality in their own neighborhoods.

Four quick thoughts on such news reports:

1. They often emphasize the changing nature of suburbs. This is true: the suburbs are becoming more racially, ethnically, and economically diverse. At the same time, this does not mean this is happening evenly across suburbs.

2. They often use a representative suburb as a case study to try to illustrate broader trends in the suburbs. Here, it is Westerville, Ohio, home to the Tuesday night Democratic debate. Can one suburb illustrate the broader trends in all suburbs? Maybe.

3. They stress that the swing voters are in the suburbs since city residents are more likely to vote for Democrats while rural residents are more likely to vote for Republicans. It will be interesting to see how Democratic candidates continue to tour through urban areas; will they spend more time in denser population areas or branch out to middle suburbs that straddle the line between solid Republican bases further away from the city and solid Democratic bases closer to the city?

4. Even with the claim that the suburbs are key to the next election, this often sheds little light on long-term trends. As an exception, the last paragraph in the quotation above stands out: suburban voters may turn one way nationally but this does not necessarily translate into more local political action or preferences.

Americans consume more media, sit more

A recent study shows Americans are sitting more and connects this to increased media usage:

That’s what Yin Cao and an international group of colleagues wanted to find out in their latest study published in JAMA. While studies on sitting behavior in specific groups of people — such as children or working adults with desk jobs — have recorded how sedentary people are, there is little data on how drastically sitting habits have changed over time. “We don’t know how these patterns have or have not changed in the past 15 years,” says Cao, an assistant professor in public health sciences at the Washington University School of Medicine.

The researchers used data collected from 2001 to 2016 by the National Health and Nutrition Examination Survey (NHANES), which asked a representative sample of Americans ages five and older how many hours they spent watching TV or videos daily in the past month, and how many hours they spent using a computer outside of work or school. The team analyzed responses from nearly 52,000 people and also calculated trends in the total time people spent sitting from 2007 to 2016. Overall, teens and adults in 2016 spent an average of an hour more each day sitting than they did in 2007. And most people devoted that time parked in front of the TV or videos: in 2016, about 62% of children ages five to 11 spent two or more hours watching TV or videos every day, while 59% of teens and 65% of adults did so. Across all age groups, people also spent more time in 2016 using computers when they were not at work or school compared to 2003. This type of screen time increased from 43% to 56% among children, from 53% to 57% among adolescents and from 29% to 50% among adults…

The increase in total sitting time is likely largely driven by the surge in time spent in front of a computer. As eye-opening as the trend data are, they may even underestimate the amount of time Americans spend sedentary, since the questions did not specifically address time spent on smartphones. While some of this time might have been captured by the data on time spent watching TV or videos, most people spend additional time browsing social media and interacting with friends via texts and video chats — much of it while sitting.

Does this mean the Holy Grail of media is screentime that requires standing and/or walking around to avoid sitting too much? Imagine a device that requires some movement to work. This does not have to be a pedal powered gaming console or smartphone but perhaps just a smartphone that needs to move 100 feet every five minutes to continue. (Then imagine the workarounds, such as motorized scooter while watching a screen a la Wall-E.)

Of course, the answer might be to just consume less media content on screens. This might prove difficult. Nielsen reports American adults consume 11 hours of media a day. Even as critics have assailed television, films, and Internet and social media content, Americans still choose (and are pushed as well) to watch more.

Yahoo News leads with Chicago murders and then says it is not the murder capital

If the point of a news story/video is to say something is not true, would you lead with the data from the not true side?

ChicagoMurderCapital121818.png

Here is the way this seems to work: grab your attention with a publicly available statistic that stands out. Oh my, how could there be so many murders in one city?? But, several sentences later, tell the reader/viewer that multiple other cities have a higher murder rate. And include in the last sentence that the murder number in Chicago has been down in recent years. So, wait: Chicago really isn’t the murder capital?

I’m trying to figure out how this adds to the public discourse. Here are a few possibilities:

  1. It is simply about clicks. Get people’s attention with a statistic and a video, throw in some data. Easy to produce, not much content.
  2. The goal is to highlight the still-high number of murders in Chicago.
  3. The goal is to point out that other cities actually experience more murders per capita.
  4. To give those who teach statistics an example of how data can be twisted and/or used without telling much of a story.

 

News story suggests 40% is “Almost Half”

A Bloomberg story looks at the rise in birth in the United States outside of marriage and has this headline:

Almost Half of U.S. Births Happen Outside Marriage, Signaling Cultural Shift

And then the story quickly gets to the data:

Forty percent of all births in the U.S. now occur outside of wedlock, up from 10 percent in 1970, according to an annual report released on Wednesday by the United Nations Population Fund (UNFPA), the largest international provider of sexual and reproductive health services. That number is even higher in the European Union.

Almost Half of U.S. Births Happen Outside Marriage, Signaling Cultural Shift

There is no doubt that this is significant trend over nearly 50 years. One expert sums this up toward the end of the story:

The traditional progression of Western life “has been reversed,” said John Santelli, a professor in population, family health and pediatrics at Columbia’s Mailman School of Public Health. “Cohabiting partners are having children before getting married. That’s a long-term trend across developing nations.”

Yet, the headline oversells the change. A move from 10% of births to 40% of births is large. But, is 40% nearly 50%? When I hear almost half, I would expect a number between 45% and 49.99%. Claiming 40% is nearly half is going a little too far.

I think the reading public would better served by either using the 40% figure or saying “Two-Fifths.” Or, perhaps the headline might speak to the 30% jump in nearly 50 years.

In the grand scheme of things, this is a minor issue. The rest of the story does a nice job presenting the data and discussing what is behind the change. But, this is a headline dominated age – you have to catch those eyes scrolling quickly on their phones – and this headline goes a bit too far.

If one survey option receives the most votes (18%), can the item with the least votes (2%) be declared the least favorite?

The media can have difficulty interpreting survey results. Here is one recent example involving a YouGov survey that asked about the most attractive regional accents in the United States:

Internet-based data analytics and market research firm YouGov released a study earlier this month that asked 1,216 Americans over the age of 18 about their accent preferences. The firm provided nine options, ranging from regions to well-known dialects in cities. Among other questions, YouGov asked, “Which American region/city do you think has the most attractive accent?”

The winner was clear. The Southeastern accent, bless its heart, took the winning spot, with the dialect receiving 18 percent of the vote from the study’s participants. Texas wasn’t too far behind, nabbing the second-most attractive accent at 12 percent of the vote…

The least attractive? Chicago rolls in dead last, with just 2 percent of “da” vote.

John Kass did not like the results and consulted a linguist:

I called on an expert: the eminent theoretical linguist Jerry Sadock, professor emeritus of linguistics from the University of Chicago…

“The YouGov survey that CBS based this slander on does not support the conclusion. The survey asked only what the most attractive dialect was, the winner being — get this — Texan,” Sadock wrote in an email.

“Louie Gohmert? Really? The fact that very few respondents found the Chicago accent the most attractive, does not mean that it is the least attractive,” said Sadock. “I prefer to think that would have been rated as the second most attractive accent, if the survey had asked for rankings.”

In the original YouGov survey, respondents were asked: “Which American region/city do you think has the most attractive accent?” Respondents could select one option. The Chicago accent did receive the least number of selections.

However, Sadock has a point. Respondents could only select one option. If they had the opportunity to rank them, would the Chicago accent move up as a non-favorite but still-liked accent? It could happen.

Additionally, the responses were fairly diverse across the respondents. The original “winner” Southeastern accent was only selected by 18% of those surveyed. This means that over 80% of the respondents did not select the leading response. Is it fair to call this the favorite accent of Americans when fewer than one-fifth of respondents selected it?

Communicating the nuances of survey results can be difficult. Yet, journalists and other should resist the urge to immediately identify “favorites” and “losers” in such situations where the data does not show an overwhelming favorite respondents did not have the opportunity to rate all of the possible responses.

Can a list of the most beautiful homes in Dallas include McMansions?

An earlier article I published suggested McMansions are not viewed as negatively in Dallas compared to New York City. The list of “the hand-down 10 most beautiful homes in Dallas” from D Magazine includes two references to McMansions:

Each year of the last decade, the editors of D Home have canvassed the city to bring you a list of “10 Most Beautiful Homes” that hopefully appeal to every taste. While on the road, we’ve spilled endless Diet Cokes due to sudden stops, exposed ourselves to the occasional McMansion, and risked looking like embarrassingly low-tech private investigators snapping photos with our iPhones. We do it all for you!…

We once named Tokalon Drive the most beautiful street in Dallas, which we suppose makes this 4236-square-foot dwelling the most beautiful home on the most beautiful street in Dallas. Plus, it reminds us why turrets are actually totally cool and not just something that just gets thrown on a McMansion. All that’s missing is a moat.

Yet, the list of 10 homes includes no McMansions. While these are large and expensive homes, all were constructed prior to World War II and have an architectural coherence that many McMansions lack. However, homes on this list for previous years did include newer homes and I would guess some of these 2017 selections have had major work done to them which might also negate some of their old-image charm.

Even in Dallas, such lists may not be able to select or trumpet McMansions as beautiful homes. If you run in certain circles – particularly when your readers are educated and wealthy – McMansions are a dirty word. A magazine like this that considers itself “a member of the original generation of city magazines: New York Magazine, Washingtonian, Philadelphia, Boston and Chicago” could likely not support such as crass consumer item as the McMansion.