Claim: we see more information today so we see more “improbable” events

Are more rare events happening in the world or are we just more aware of what is going on?

In other words, the more data you have, the greater the likelihood you’ll see wildly improbable phenomena. And that’s particularly relevant in this era of unlimited information. “Because of the Internet, we have access to billions of events around the world,” says Len Stefanski, who teaches statistics at North Carolina State University. “So yeah, it feels like the world’s going crazy. But if you think about it logically, there are so many possibilities for something unusual to happen. We’re just seeing more of them.” Science says that uncovering and accessing more data will help us make sense of the world. But it’s also true that more data exposes how random the world really is.

Here is an alternative explanation for why all these rare events seem to be happening: we are bumping up against our limited ability to predict all the complexity of the world.

All of this, though, ignores a more fundamental and unsettling possibility: that the models were simply wrong. That the Falcons were never 99.6 percent favorites to win. That Trump’s odds never fell as low as the polling suggested. That the mathematicians and statisticians missed something in painting their numerical portrait of the universe, and that our ability to make predictions was thus inherently flawed. It’s this feeling—that our mental models have somehow failed us—that haunted so many of us during the Super Bowl. It’s a feeling that the Trump administration exploits every time it makes the argument that the mainstream media, in failing to predict Trump’s victory, betrayed a deep misunderstanding about the country and the world and therefore can’t be trusted.

And maybe it isn’t very easy to reconcile these two explanations:

So: Which is it? Does the Super Bowl, and the election before it, represent an improbable but ultimately-not-confidence-shattering freak event? Or does it indicate that our models are broken, that—when it comes down to it—our understanding of the world is deeply incomplete or mistaken? We can’t know. It’s the nature of probability that it can never be disproven, unless you can replicate the exact same football game or hold the same election thousands of times simultaneously. (You can’t.) That’s not to say that models aren’t valuable, or that you should ignore them entirely; that would suggest that data is meaningless, that there’s no possibility of accurately representing the world through math, and we know that’s not true. And perhaps at some point, the world will revert to the mean, and behave in a more predictable fashion. But you have to ask yourself: What are the odds?

I know there is a lot of celebration of having so much available information today but it isn’t necessarily easy adjusting to the changes. Taking it all in requires some effort on its own but the hard work is in the interpretation and knowing what to do with it all.

Perhaps a class in statistics – in addition to existing efforts involving digital or media literacy – could help many people better understand all of this.

Richard Florida: we lack systematic data to compare cities

As he considers Jane Jacobs’ impact, Richard Florida suggests we need more data about cities:

MCP: Some of the research around the built environment is pretty skimpy and not very scientific, in a lot of cases.

RF: Right. And it’s done by architects who are terrific, but are basically looking at it from the building level. We need a whole research agenda. A century or so ago John Hopkins University invented the teaching hospital, modern medicine. They said, medicine could be advanced by underpinning the way doctors treat people and develop clinical methodologies, with a solid, scientific research base. Think of it as a system that runs from laboratory to bed-side. We don’t have that for cities and urbanism.  But at the same time we know that the city is the key economic and social unit of our time. Billions of people across the world are pouring into cities and we are spending trillions upon trillions of dollars building new cities and rebuilding, expanding and upgrading existing ones. We’re doing it with little in the way of systematic research. We lack even the most basic data we need to compare and assess cities around the world. There’s no comparable grand challenge that we have so terribly under funded as cities and urbanism. We need to develop everything from the underlying science to better understand cities and their evolution, the systematic data to assess them and the educational and clinical protocols for building better, more prosperous and inclusive cities. Right now, mayors are out there winging it. Economic developers are out there winging it. There’s no clinical training program. There are some, actually, but they’re scattered about and they’re not having much impact. It’s going to take a big commitment. But we need to build the equivalent of the medical research infrastructure, with the equivalent of “teaching hospitals” for our cities.  When you think of it cities are our greatest laboratories for advancing our understanding the intersection of natural, physical, social and human environments—they’re our most complex organisms. This is going to be my next big research project: I’m calling it the Urban Genome Project. It’s what I hope to devote the rest of my career doing.

The cities as laboratories language echoes that of the Chicago School. But, much of the sociological literature suggests a basic tension in this area: how much are cities alike compared to how much are they different? Are there common processes across most or all cities that we can highlight and work with or does their unique contexts limit how much generalizing can be done? Hence, we have a range of studies with everything from examining large sets of cities at once or processes across all cities (like Florida would argue in The Rise of the Creative Class) versus studies of particular neighborhoods and cities to discover their idiosyncratic patterns.

Of course, we could just look at cities like a physicist might and argue there are power laws underlying cities…

Social science assumes “human living is not random”

Noted sociologist of religion Grace Davie gives a brief description of her work:

My work, like that of all social scientists, rests on the assumption that human living is not random. Why is it, for example, that Christian churches in the West are disproportionately attended by women? That requires an explanation.

This is a good starting point for describing the social sciences. There are patterns to human social life and we can’t rely on anecdotes or interpretations of whether there are patterns or how to understand them. We want to apply a scientific perspective to these patterns and explain why those patterns, and not others, exist. Then, we might delve deeper into level of analysis, theoretical assumptions, and techniques of data collection and analysis – three areas where the various social science disciplines differ.

Researchers fact-checking their own ethnographic data

Toward the end of a long profile of sociologist Matthew Desmond is an interesting section regarding ethnographic methods:

Desmond has done an especially good job spelling out precisely how he went about his research and verified his findings, says Klinenberg. At the start of Evicted, an author’s note states that most of the events in the book took place between May 2008 and December 2009. Except where it says otherwise in the notes, Desmond writes, all events that happened between those dates were observed firsthand. Every quotation was “captured by a digital recorder or copied from official documents,” he adds. He also hired a fact-checker who corroborated the book by combing public records, conducting some 30 interviews, and asking him to produce field notes that verified a randomly selected 10 percent of its pages.

Desmond has been equally fastidious about taking himself out of the text. Unlike many ethnographic studies, including Goffman’s, his avoids the first person. He wants readers to react directly to the people in Evicted. “Ethnography often provokes very strong feelings,” he says. “So I wanted the book to do that. But not about me.”

Ethnographers should be more skeptical about their data, Desmond believes. In his fieldwork, for example, he saw women getting evicted at higher rates than men. But when he crunched the data, analyzing hundreds of thousands of court records, it turned out that was only the case in predominantly black and Latino neighborhoods. Women in white neighborhoods were not evicted at higher rates than men. The field had told him a half-truth.

Still, beyond acknowledging that the reception of Goffman’s book shaped his fact-checking, he will say nothing about the controversy. Even an old journalism trick — letting a silence linger, in the hope that an interviewee will fill it — fails to wring a quote from him. “This is such a good technique,” he says after a few seconds, “where you just kind of let the person talk.” Then he sips his Diet Coke, waiting for the next question.

This gets at some basic questions about what ethnography is. Should it be participant observation with a reflexive and involved researcher? Letting the research subjects speak for themselves with minimal interpretation? Should it involve fact-checking and verifying data? Each of these could have their merit and sociologists pursue different approaches. Contrasting the last two, for example, how people describe their own circumstances and understanding could be very important even if what is reported is not necessarily true. On the other hand, more and more ethnographies involve reflexive commentary from the researcher on how their presence and personal characteristics influenced the data collection and inteprretation.

It sounds to me like Desmond is doing some mixed methods work: starting with ethnographic data that he directly observes but then using secondary analysis (in the example above, using official records) to better understand both the micro level that he observed as well as the broader patterns. This means more work for each study but also more comprehensive data.

Census 2020 to go digital and online

The Census Bureau is developing plans to go digital in 2020:

The bureau’s goal is that 55% of the U.S. population will respond online using computers, mobile phones or other devices. It will mark the first time (apart from a small share of households in 2000) that any Americans will file their own census responses online. This shift toward online response is one of a number of technological innovations planned for the 2020 census, according to the agency’s recently released operational plan. The plan reflects the results of testing so far, but it could be changed based on future research, congressional reaction or other developments…

The Census Bureau innovations are driven by the same forces afflicting all organizations that do survey research. People are increasingly reluctant to answer surveys, and the cost of collecting their data is rising. From 1970 to 2010, the bureau’s cost to count each household quintupled, to $98 per household in 2010 dollars, according to the GAO. The Census Bureau estimates that its innovations would save $5.2 billion compared with repeating the 2010 census design, so the 2020 census would cost a total of $12.5 billion, close to 2010’s $12.3 billion price tag (both in projected 2020 dollars)…

The only households receiving paper forms under the bureau’s plan would be those in neighborhoods with low internet usage and large older-adult populations, as well as those that do not respond online.

To maximize online participation, the Census Bureau is promoting the idea that answering the census is quick and easy. The 2010 census was advertised as “10 questions, 10 minutes.” In 2020, bureau officials will encourage Americans to respond anytime and anywhere – for example, on a mobile device while watching TV or waiting for a bus. Respondents wouldn’t even need their unique security codes at hand, just their addresses and personal data. The bureau would then match most addresses to valid security codes while the respondent is online and match the rest later, though it has left the door open to restrict use of this option or require follow-up contact with a census taker if concerns of fraud arise.

Perhaps the marketing slogan could be: “Do the Census online to save your own taxpayer dollars!”

It will be interesting to see how this plays out. I’m sure there will be plenty of tests to (1) make sure the people responding are matched correctly to their address (and that fraud can’t be committed); (2) the data collected is as accurate as going door to door and mailing out forms; and (3) the technological infrastructure is there to handle all the traffic. Even after going digital, the costs will be high and I’m guessing more people will ask why all the expense is necessary. Internet response rates to surveys are notoriously low so it may take a lot of marketing and reminders to get a significant percentage of online respondents.

But, if the Census Bureau can pull this off, it could represent a significant change for the Census as well as other survey organizations.

(The full 192 page PDF file of the plan is here.)

Using smartphones to collect important economic data

Virginia Postrel describes a new app used in a number of countries to gather economic data on the ground:

Founded in 2012, the San Francisco-based startup Premise began by looking for a way to supplement official price indices with a quick-turnaround measure of inflation and relative currency values. It needed “a scalable, cost-effective way to collect a lot of price data,” chief executive David Soloff said in an interview. The answer was an Android app and more than 30,000 smart-phone-wielding contractors in 32 countries.

The contractors, who are paid by the usable photo and average about $100 a month, take pictures aimed at answering specific economic questions: How do the prices in government-run stores compare to those in private shops? Which brands of cigarette packages in which locations carry the required tax stamp? How many houses are hooked into power lines? What’s happening to food prices? Whatever the question, the data needed to answer it must be something a camera can capture…

The result is a collection of price indices updated much more frequently and with less time lag — although also fewer indicative items — than monthly government statistics. For Bloomberg terminal subscribers, Premise tracks food and beverage prices in the U.S., China, India, Brazil and Argentina, using indices mirroring government statistics. It gets new information daily; Bloomberg publishes new data twice a week. Premise tracks a similar index in Nigeria for Standard Chartered bank, which has made the aggregate data public. (Premise clients can drill down to see differences across products, types of retailers, or regions.) While more volatile than official statistics, the figures generally anticipate them, serving as an early-warning system for economic trends…

Premise has government clients, and it carefully positions its work as a complement to official statistics, as well as to the academic Billion Prices Project, which scrapes massive amounts of price data from online sources but can’t say what cooking oil sells for in a corner shop. Make no mistake, however: Its methods also provide valuable competition to the official data. The point, after all, is to find out what’s actually happening, not what government reports will say in a few weeks.

This is an innovative way to get data more quickly. It would be interesting to see how reliable this data is. Now it remains to be seen how markets, governments, and others will use more up-to-date information.

More broadly, smartphones could be used to collect all sorts of data. See previous posts on using the microphone and the use of additional apps such as Twitter and Waze.

“The most misleading charts of 2015, fixed”

Here are some improved charts first put forward by politicians, advocacy groups, and the media in 2015.

I’m not sure exactly how they picked “the most misleading charts” (is there bias in this selection?) but it is interesting that several involve a misleading y-axis. I’m not sure that I would count the last example as a misleading chart since it involves a definition issue before getting to the chart.

And what is the purpose of the original, poorly done graphics? Changing the presentation of the data provides evidence for a particular viewpoint. Change the graphic depiction of the data and another story could be told. Unfortunately, it is actions like these that tend to cast doubt on the use of data for making public arguments – the data is simply too easy to manipulate so why rely on data at all? Of course, that assumes people look closely at the chart and the data source and know what questions to ask…

Quick Review: The Third Coast: When Chicago Built the American Dream

Thomas Dyja has a provocative argument in The Third Coast: while New York and LA are widely viewed as America’s cultural centers, Chicago of the mid-1900s contributed more than people think to American culture. My quick review of the book:

  1. The fact that the book is built on impressionistic vignettes is book its greatest strength and weakness. Dyja tells a number of interesting stories about cultural figures in Chicago from author Nelson Algren to Bauhaus member László Moholy-Nagy to University of Chicago president Robert Hutchins to puppeteer and TV show creator Burr Tillstrom to magazine creator Hugh Hefner. The characters he profiles have highs and lows but they are all marked by a sort of middle America creativity based on hard work, connecting with audiences, and not being flashy.
  2. Yet, stringing together a set of characters doesn’t help him make his larger argument that Chicago was influential. We get pieces of evidence – an important contribution to television here, the importance of Chess records, a clear contribution to architecture there – but no comparative element. By his lack of attention, Dyja suggests Chicago didn’t contribute much – art is one such area with a lack of a vibrant modern art scene (though what TripAdvisor ratings say is the world’s #1 museum does not get much space). Just how much did these actions in Chicago change the broader American culture? What was going on in New York and LA at those times? The data is anecdotal and difficult to judge.
  3. A few of the more interesting pieces of the book: he suggests Chicago contributed more to the Civil Rights Movement than many people remember (particularly due to the Emmett Till case); Chicago music, particularly through Muddy Waters and Howlin’ Wolf, was particularly influential elsewhere; Mayor Richard J. Daley was on one hand supportive of the arts but only in a functional sense and the arts scene slowly died away into the early 1960s as creative type went elsewhere.

Ultimately, it is hard to know whether these contributions from Chicago really mattered or not. The one that gets the most attention – architecture through former members of the Bauhaus and then the International Style – probably really was a major contribution for both American and global cities. But even there, the focus of this book is on the people and not necessarily on their buildings or how normal Chicagoans experienced those structures or how the changes fit within the large social-political-economic scene in Chicago.

Debate over data on the mental fragility of college students

A recent study suggests there is a need for more data to claim that today’s college students are more fragile:

The point, overall, is that given the dizzying array of possible factors at work here, it’s much too pat a story to say that kids are getting more “fragile” as a result of some cultural bugaboo. “I think it’s not only an oversimplification, I think it’s unfair to the kids, many of whom are very hardworking and tremendously diligent, and working in systems that are often very competitive,” said Schwartz. “Many of the kids are doing extraordinarily well, and I think it’s unfair to portray this whole group of people as being somehow weakhearted or weak-minded in some sense, when there’s no evidence to really support it.”

It hasn’t gone unnoticed among those who study college mental health that there’s an interesting divide at work here: College counselors are so convinced kids’ mental health is getting worse that it’s become dogma in some quarters, and yet it’s been tricky to find any solid, rigorous evidence of this. Some researchers have tried to dig into counseling-center data in an attempt to explain this discrepancy. One recent effort, published in the October issue of the Journal of College Student Psychopathology, comes from Allan J. Schwartz, a psychiatry professor at the University of Rochester who has devoted a chunk of his career to studying college suicide. Schwartz examined data from “4,755 clients spanning a 15-year period from 1992-2007” at one university, poring over the records to determine whether students who came in contact with that school’s counseling services had, over that period, exhibited increasing levels of distress in the form of suicidality, anxiety and phobic disorders, overall signs of serious mental illness, and other measures. (The same caveat I mentioned above applies here — such a study can only tell us about rates of pathology among kids who go to counseling centers. But it can at least help determine whether counselors are right that among the kids they see every day, things are getting worse.)

Schwartz found no evidence to support the pessimistic view. With the exception of suicidality, where he noted a “significant decline” over the years, every other measure he looked at held stable over the study’s 15-year span. In his paper, Schwartz rightly notes that there are limitations to what we can extrapolate from a study of a single campus. But he goes on to explain that four other, similar studies, published between 1996 and 2007, also sought to track changes in pathology over time in single-university settings, and they too found no empirical evidence that things have been getting worse. This doesn’t definitively prove that kids who seek counseling aren’t getting sicker, of course. But statistically, Schwartz argues, it’s unlikely that five studies looking at different schools would all come up with null findings if, in fact, there was a widespread increase in student pathology overall.

I don’t know this area of research but it sounds like there is room for disagreement and/or need for more definitive data about what is going on among college students.

A broader observation: claims about cultural zeitgeists are not always backed with data. On one hand, perhaps the change is coming so quickly or underneath the radar (it takes time for scientists and others to measure things) that data simply can’t be found. On the other hand, claims about trends are often based on anecdotes and particular points of view that break down pretty quickly when compared to data that is available.

The FBI doesn’t collect every piece of data about crime

The FBI released the 2014 Uniform Crime Report Monday but it doesn’t have every piece of information we might wish to have:

As I noted in May, much statistical information about the U.S. criminal-justice system simply isn’t collected. The number of people kept in solitary confinement in the U.S., for example, is unknown. (A recent estimate suggested that it might be as many as 80,000 and 100,000 people.) Basic data on prison conditions is rarely gathered; even federal statistics about prison rape are generally unreliable. Statistics from prosecutors’ offices on plea bargains, sentencing rates, or racial disparities, for example, are virtually nonexistent.

Without reliable data on crime and justice, anecdotal evidence dominates the conversation. There may be no better example than the so-called “Ferguson effect,” first proposed by the Manhattan Institute’s Heather MacDonald in May. She suggested a rise in urban violence in recent months could be attributed to the Black Lives Matter movement and police-reform advocates…

Gathering even this basic data on homicides—the least malleable crime statistic—in major U.S. cities was an uphill task. Bialik called police departments individually and combed local media reports to find the raw numbers because no reliable, centralized data was available. The UCR is released on a one-year delay, so official numbers on crime in 2015 won’t be available until most of 2016 is over.

These delays, gaps, and weaknesses seem exclusive to federal criminal-justice statistics. The U.S. Department of Labor produces monthly unemployment reports with relative ease. NASA has battalions of satellites devoted to tracking climate change and global temperature variations. The U.S. Department of Transportation even monitors how often airlines are on time. But if you want to know how many people were murdered in American cities last month, good luck.

There could be several issues at play including:

  1. A lack of measurement ability. Perhaps we have some major disagreements about how to count certain things.
  2. Local law enforcement jurisdictions want some flexibility in working with the data.
  3. A lack of political will to get all this information.

My guess is that the most important issue is #3. If we wanted this data we could get this data. Yet, it may require concerted efforts by individuals or groups to make the issues enough of a social problem to ask that we collect good data. This means that the government and/or public needs a compelling enough reason to get uniformity in measurement and consistency in reporting.

How about this reason: having consistent and timely reporting on such data would help cut down on anecdotes and instead correctly keep the American public up to date. They could then make more informed political and civic choices. Right now, many Americans don’t quite know what is happening with crime rates as their primary sources are anecdotes or mass media reports (which can be quite sensationalistic).