Earning more yearly from the growing value of your home than a minimum wage job?

Zillow suggests the growth in home values in about half of the United States’ largest cities is higher than working a full-time minimum wage job:

The typical U.S. home appreciated 7.6 percent over the past year, from a median value of $195,400 in February 2017 to $210,200 at the end of February 2018. That $14,800 bump in value translates to a gain in home equity of $7.09 for every hour the typical U.S. homeowner was at the office last year (assuming a standard 40-hour work week),[1] a shade less than the federal minimum wage of $7.25 per hour.

Overall, owners of the median-valued home in 24 of the nation’s 50 largest cities earned more in equity per hour over the past year than their local minimum wage.[2] But homeowners in a handful of U.S. cities made out a lot better than that – in some cases much, much better.

The median U.S. household earned roughly $60,000 in 2017 ($58,978 to be exact),[3] or a little more than $28 per hour. But in six U.S. cities – New York, San Diego, San Jose, San Francisco, Seattle and Oakland – owners of the median-valued local home gained more than that in home equity alone. And if earning a six-figure annual salary represents a certain amount of privilege, homeowners in San Francisco, San Jose and Seattle all made comfortably more than that simply by virtue of owning a local home…

A home is often a person’s biggest financial investment, and according to the 2017 Zillow Group Consumer Housing Trends Report, the typical American homeowner has 40 percent of their wealth tied up in their home. A recent Zillow survey found that 70 percent of Americans[4] view their home as a positive long-term investment.

This is both an interesting and weird comparison. For the interesting part: most people understand the abstract idea of working a minimum wage job. They should know that a full year of work at that rate does not generate much money. The reader is supposed to be surprised that simply owning a home could be a more profitable activity than working.

But, there are a number of weird features of this comparison. Here are four:

First, not all that many Americans work full-time minimum wage jobs. People understand the idea but tend to overestimate how many people work just for minimum wage.

Second, roughly half the cities on this list did not experience such an increase in housing values. Without comparisons over time, it is hard to know whether this information about 24 out of 50 cities is noteworthy or not.

Third, the comparison hints that a homeowner could choose to not work and instead reap the benefits of their home’s value. This question is posed in the first paragraph: “Why work a 9-5 slog, when you can sit back and collect substantial hourly home equity “earnings” instead?” Oddly, after the data is presented, there is a disclaimer section at the end where the difference between working a job and earning money through selling a home is explained.

Fourth, to purchase a home, particularly in the hottest markets cited, someone has to start with a good amount of capital. In other words, the people who would be working full-time minimum wage jobs for a full year are not likely to be the ones who would benefit from the growth in their home’s equity. It takes a certain amount of wealth to even own a home and then even more if someone wanted to profit from just owning homes.

Overall, I would give Zillow some credit for trying to compare the growth in home values to a known entity (a minimum wage job) but the comparison falls apart pretty quickly when one gets past the headline.

The double-edged sword of record home prices in many American metro areas

The housing bubble of the late 2000s may be long gone as housing prices continue to rise:

Prices for single-family homes, which climbed 5.3 percent from a year earlier nationally, reached a peak in 64 percent of metropolitan areas measured, the National Association of Realtors said Tuesday. Of the 177 regions in the group’s survey, 15 percent had double-digit price growth, up from 11 percent in the third quarter.

Home values have grown steadily as the improving job market drives demand for a scarcity of properties on the market. While prices jumped 48 percent since 2011, incomes have climbed only 15 percent, putting purchases out of reach for many would-be buyers.

The consistent price gains “have certainly been great news for homeowners, and especially for those who were at one time in a negative equity situation,” Lawrence Yun, the Realtors group’s chief economist, said in a statement. “However, the shortage of new homes being built over the past decade is really burdening local markets and making homebuying less affordable.”

Having read a number of stories like this, I wonder if there is a better way to distinguish between economic indicators that are good all around versus one like this that may appear good – home values are going up! – but really mask significant issues – the values may be going up because many buyers cannot afford more costly homes. The news story includes this information but I suspect many will just see the headline and assume things are good. Another example that has been in a lot of partisan commentaries in recent years (with supporters of both sides suggesting this when their party was not president): the unemployment rate is down but it does not account for the people who have stopped looking for work.

In the long run, we need (1) better measures that can encompass more dimensions of particular issues, (2) better reporting on economic indicators, and (3) a better understanding among the general populace about what these statistics are and what they mean.

Multiple measures and small trends: American birthrates down, births per woman up

A new Pew report explains this statistical oddity: the annual birthrate in the US is down but women are having more children.

How can fertility be down even as the number of women who are having children is going up? There are complex statistical reasons for this, but the main cause of this confusing discrepancy is the age at which women are having children. Women are having children later in life — the median age for having a first baby is 26 now, up from 23 in 1994 — and this delay causes annual birth rates to go down, even as the cumulative number of babies per woman has risen…

 

Another factor, Livingston said, is the drop in teen birth rates, with black women seeing the biggest drop in that category.

See the Pew report here. An additional part of the explanation is that there are multiple measures at play here. A Pew report from earlier in 2018 explains:

But aside from this debate, the question remains: Is this really a record low? The short answer is: It’s complicated.

That’s because there are different ways to measure fertility. Three of the most commonly used indicators of fertility are the general fertility rate (GFR); completed fertility; and the total fertility rate (TFR). All three reflect fertility behavior in slightly different ways – respectively, in terms of the annual rate at which women are presently having kids; the number of kids they ultimately have; or the hypothetical number they would likely have based on present fertility patterns.

None of these indicators is “right” or “wrong,” but each tells a different story about when fertility bottomed out.

Measurement matters and the different measures can fit different social and political views.

I wonder if part of the issue is also that there is a clear drop in births from the earlier era – roughly 1950 to 1970 which we often associate with Baby Boomers – but the last 3+ decades have been relatively flat. This plateau of recent decades means researchers and commentators may be more prone to jump on small changes in the data. Many people would love to predict the next big significant rise or fall in numbers but a significant change may not be there, particularly when looking at multiple measures.

Reading into a decreasing poverty rate, increasing median household income

Here are a few notable trends in the new data that shows the poverty rate is down in the United States and median household incomes are up:

Regionally, economic growth was uneven.
The median household income in the Midwest grew just 0.9 percent from last year, which is not a statistically significant amount. In the South, by contrast, the median income grew 3.9 percent; in the West, it grew 3.3 percent. “The Midwest is the place where we should have the greatest worry in part because we didn’t see any significant growth,” said Mary Coleman, the senior vice president of Economic Mobility Pathways, a national nonprofit that tries to move people out of poverty. Median household income was also stagnant in rural areas, growing 13 percent, to $45,830. In contrast, it jumped significantly inside cities, by 5.4 percent, to $54,834, showing that cities are continuing to pull away from the rest of the country in terms of economic success…

African Americans and Hispanics experienced significant gains in income, but still trail far behind whites and Asians.
All ethnic groups saw incomes rise between 2015 and 2016, the second such annual increase in a row. The median income of black families jumped 5.7 percent between 2015 and 2016, to $39,490. Hispanic residents also saw a growth incomes, by 4.3 percent, to $47,675. Asians had the highest median household income in 2016, at $81,431. Whites saw a less significant increase than African Americans and Hispanics, of 1.6 percent, but their earning are still far higher, at $61,858.

The poverty rate for black residents also decreased last year, falling to 22 percent, from 24.1 percent the previous year. The poverty rate of Hispanics decreased to 19.4 percent, from 21.4 percent in 2015. In comparison, 8.8 of whites, or 17.3 million people, were in poverty in 2016, which was not a statistically significant change from the previous year, and 10.1 percent of Asians, or 1.9 million people were in poverty, which was also similar to 2015…

Income inequality isn’t disappearing anytime soon.
Despite the improvements in poverty and income across ethnic groups, the American economy is still characterized by significant income inequality; while the poor are finally finding more stable footing following the recession, the rich have been doing well for quite some time now. The average household income of the the top 20 percent of Americans grew $13,749 from a decade ago, while the average household income of the bottom 20 percent of Americans fell $571 over the same time period. The top 20 percent of earners made 51.5 percent of all income in the U.S. last year, while the bottom 20 percent made just 3.5 percent. Around 13 percent of households made more than $150,000 last year; a decade ago, by comparison, 8.5 percent did. While that’s something to cheer, without a solid middle class, it’s not indicative of an economy that is healthy and stable more broadly.

Both of these figures – the poverty rate and median household incomes – are important indicators of American social and economic life. Thus, that both are trending in the right direction is good.

Yet, we also have the impulse these days to (1) dig deeper into the data and (2) also highlight how these trends may not last, particularly in the era of Trump. The trends noted above (and there are others also discussed in the article) can be viewed as troubling as the gains made by some either were not shared by others or do not erase large gaps between groups. Our understandings of these income and poverty figures can change over time as measurements change and perceptions of what is important changes. For example, the median household income going up could suggest that more Americans have more income or we may now care less about absolute incomes and pay more attention to relative incomes (and particularly the gap between those at the top and bottom).

In other words, interpreting data is influenced by a variety of social forces. Numbers do not interpret themselves and our lenses consistently change. Two reasonable people could disagree on whether the latest data is good for America or suggests there are enduring issues that still need to be addressed.

Mutant stat: 4.2% of American kids witnessed a shooting last year

Here is how a mutant statistic about the exposure of children to shootings came to be:

It all started in 2015, when University of New Hampshire sociology professor David Finkelhor and two colleagues published a study called “Prevalence of Childhood Exposure to Violence, Crime, and Abuse.” They gathered data by conducting phone interviews with parents and kids around the country.

The Finkelhor study included a table showing the percentage of kids “witnessing or having indirect exposure” to different kinds of violence in the past year. The figure under “exposure to shooting” was 4 percent.

The findings were then reinterpreted:

Earlier this month, researchers from the CDC and the University of Texas published a nationwide study of gun violence in the journal Pediatrics. They reported that, on average, 7,100 children under 18 were shot each year from 2012 to 2014, and that about 1,300 a year died. No one has questioned those stats.

The CDC-UT researchers also quoted the “exposure to shooting” statistic from the Finkelhor study, changing the wording — and, for some reason, the stat — just slightly:

“Recent evidence from the National Survey of Children’s Exposure to Violence indicates that 4.2 percent of children aged 0 to 17 in the United States have witnessed a shooting in the past year.”

The reinterpreted findings were picked up by the media:

The Dallas Morning News picked up a version of the Washington Post story.

When the Dallas Morning News figured out something was up (due to a question raised by a reader) and asked about the origins of the statistic, they uncovered some confusion:

According to Finkelhor, the actual question the researchers asked was, “At any time in (your child’s/your) life, (was your child/were you) in any place in real life where (he/she/you) could see or hear people being shot, bombs going off, or street riots?”

So the question was about much more than just shootings. But you never would have known from looking at the table.

This appears to be a classic example of a mutant statistic as described by sociologist Joel Best in Damned Lies and Statistics. As Best explains, it doesn’t take much for a number to be unintentionally twisted such that it becomes nonsensical yet interesting to the public because it seems shocking. And while the Dallas Morning News might deserve some credit for catching the issue and trying to set the record straight, the incorrect statistic is now in the public and can easily be found.

Claim: we see more information today so we see more “improbable” events

Are more rare events happening in the world or are we just more aware of what is going on?

In other words, the more data you have, the greater the likelihood you’ll see wildly improbable phenomena. And that’s particularly relevant in this era of unlimited information. “Because of the Internet, we have access to billions of events around the world,” says Len Stefanski, who teaches statistics at North Carolina State University. “So yeah, it feels like the world’s going crazy. But if you think about it logically, there are so many possibilities for something unusual to happen. We’re just seeing more of them.” Science says that uncovering and accessing more data will help us make sense of the world. But it’s also true that more data exposes how random the world really is.

Here is an alternative explanation for why all these rare events seem to be happening: we are bumping up against our limited ability to predict all the complexity of the world.

All of this, though, ignores a more fundamental and unsettling possibility: that the models were simply wrong. That the Falcons were never 99.6 percent favorites to win. That Trump’s odds never fell as low as the polling suggested. That the mathematicians and statisticians missed something in painting their numerical portrait of the universe, and that our ability to make predictions was thus inherently flawed. It’s this feeling—that our mental models have somehow failed us—that haunted so many of us during the Super Bowl. It’s a feeling that the Trump administration exploits every time it makes the argument that the mainstream media, in failing to predict Trump’s victory, betrayed a deep misunderstanding about the country and the world and therefore can’t be trusted.

And maybe it isn’t very easy to reconcile these two explanations:

So: Which is it? Does the Super Bowl, and the election before it, represent an improbable but ultimately-not-confidence-shattering freak event? Or does it indicate that our models are broken, that—when it comes down to it—our understanding of the world is deeply incomplete or mistaken? We can’t know. It’s the nature of probability that it can never be disproven, unless you can replicate the exact same football game or hold the same election thousands of times simultaneously. (You can’t.) That’s not to say that models aren’t valuable, or that you should ignore them entirely; that would suggest that data is meaningless, that there’s no possibility of accurately representing the world through math, and we know that’s not true. And perhaps at some point, the world will revert to the mean, and behave in a more predictable fashion. But you have to ask yourself: What are the odds?

I know there is a lot of celebration of having so much available information today but it isn’t necessarily easy adjusting to the changes. Taking it all in requires some effort on its own but the hard work is in the interpretation and knowing what to do with it all.

Perhaps a class in statistics – in addition to existing efforts involving digital or media literacy – could help many people better understand all of this.

A better interpretation of crime statistics for Chicago suburbs

The Daily Herald looks at recent crime figures in Chicago area suburbs. How should we interpret such numbers?

Violent crimes increased last year in half of 80 suburbs, says a new report by the FBI we’ve been analyzing.

Property crimes increased in more than 40 percent of the suburbs.

The Uniform Crime Reporting Program’s 2015 report shows Rosemont had a 94 percent increase in violent crimes, from 18 in 2014 to 35 in 2015. Most are assaults, but the category also includes rape, homicide and robbery. The village had a 29 percent increase in property crimes, which include arson, burglary and vehicle theft.

Other more populous suburbs had larger numbers of violent crimes in 2015, including 650 in Aurora, 261 in Elgin and 128 in Naperville.

Violent crimes remained largely flat in Palatine, with 36; Des Plaines, with 50; and Arlington Heights, with 42; while some communities saw crimes decrease across the board. Buffalo Grove saw an 80 percent decrease in violent crimes, to 2, and an 18 percent decrease in property crimes, to 234, while Prospect Heights saw a 33 percent decrease in violent crimes, to 14, and a 29 percent decrease in property crimes, to 112.

What I would take away:

  1. Looking across communities, there was not much change as half of the suburbs did not experience a rise in violent crimes and property crimes increased in less than half of the suburbs.
  2. It is interesting to note larger jumps in crime in certain communities. However, these should be interpreted in light of #1 and it would be more helpful to look at crime rates in these larger suburbs rather than just relying on occurrences.
  3. The last paragraph notes some major changes in other suburbs. But, some of these suburbs are smaller and a large decrease (80% in Buffalo Groves means a drop from 10 to 2) or increase could be more a function of not many crimes overall rather than indicate a larger trend.
  4. There is little indication of crime figures or rates over time which would help put the 2015 figure in better perspective.

All together, the headline “40 suburbs see spike in violent crimes in 2015” is not the most accurate. It may catch the attention of readers but neither the headline or article sufficiently discuss the statistics.

To see recent spike in murders in big cities, you have to see the decline before that

New data suggests murders are up in some major American cities. Yet, to see this spike, you have to acknowledge the steady decline in previous years:

Baltimore, Chicago, Milwaukee, New Orleans, New York City, St. Louis and Washington, D.C., among others, have all seen significant increases in their murder rates through the first half of 2015.

Homicides in St. Louis, for example, are up almost 60% from last year while robberies are up 40%. In Washington, D.C., 73 people have been killed so far this year, up from 62 last year, an 18% jump. In Milwaukee, murders have doubled since last year, while in nearby Chicago homicides have jumped almost 20%…

Criminologists warn that the recent spikes could merely be an anomaly, a sort of reversion to the mean after years of declining crime rates. But there could be something else going on, what some officials have called a “Ferguson effect,” in which criminals who are angry over police-involved shootings like that of Michael Brown, an unarmed black teenager who was shot and killed by a white police officer in August, have felt emboldened to commit increased acts of violence.

It is hard to have it both ways by complaining about high crime rates before this year and then now complaining about a spike. Crime rates were down for nearly two decades in most major cities prior to this year. Yet, this wasn’t the perception. Thus, we might see this spike as “Crime rates were high and now they are even higher!” or it could be “Crime rates declined for a long period and now this is a spike.” These are two different stories.

Two other quick thoughts:

1. This story is unclear about whether this is true across the board in major American cities or just in the places cited here.

2. It is hard to know what this spike is about as it is happening. What will happen in a few months or in the next few years?

Argument: scientists need help in handling big data

Collecting, analyzing, and interpreting big data may just be a job that requires more scientists:

For projects like NEON, interpreting the data is a complicated business. Early on, the team realized that its data, while mid-size compared with the largest physics and biology projects, would be big in complexity. “NEON’s contribution to big data is not in its volume,” said Steve Berukoff, the project’s assistant director for data products. “It’s in the heterogeneity and spatial and temporal distribution of data.”

Unlike the roughly 20 critical measurements in climate science or the vast but relatively structured data in particle physics, NEON will have more than 500 quantities to keep track of, from temperature, soil and water measurements to insect, bird, mammal and microbial samples to remote sensing and aerial imaging. Much of the data is highly unstructured and difficult to parse — for example, taxonomic names and behavioral observations, which are sometimes subject to debate and revision.

And, as daunting as the looming data crush appears from a technical perspective, some of the greatest challenges are wholly nontechnical. Many researchers say the big science projects and analytical tools of the future can succeed only with the right mix of science, statistics, computer science, pure mathematics and deft leadership. In the big data age of distributed computing — in which enormously complex tasks are divided across a network of computers — the question remains: How should distributed science be conducted across a network of researchers?

Two quick thoughts:

1. There is a lot of potential here for crossing disciplinary boundaries to tackle big data projects. This isn’t just about parceling out individual pieces of the project; bringing multiple perspectives together could lead to an improved final outcome.

2. I wonder if sociologists aren’t particularly well-suited for this kind of big data work. Given our emphasis on theory and methods, we both emphasize the big picture as well as how to effectively collect, analyze, and interpret data. Sociology students could be able to step into such projects and provide needed insights.

Using algorithms to analyze the literary canon

A new book describes efforts to use algorithms to discover what is in and out of the literary canon:

There’s no single term that captures the range of new, large-scale work currently underway in the literary academy, and that’s probably as it should be. More than a decade ago, the Stanford scholar of world literature Franco Moretti dubbed his quantitative approach to capturing the features and trends of global literary production “distant reading,” a practice that paid particular attention to counting books themselves and owed much to bibliographic and book historical methods. In earlier decades, so-called “humanities computing” joined practitioners of stylometry and authorship attribution, who attempted to quantify the low-level differences between individual texts and writers. More recently, the catchall term “digital humanities” has been used to describe everything from online publishing and new media theory to statistical genre discrimination. In each of these cases, however, the shared recognition — like the impulse behind the earlier turn to cultural theory, albeit with a distinctly quantitative emphasis — has been that there are big gains to be had from looking at literature first as an interlinked, expressive system rather than as something that individual books do well, badly, or typically. At the same time, the gains themselves have as yet been thin on the ground, as much suggestions of future progress as transformative results in their own right. Skeptics could be forgiven for wondering how long the data-driven revolution can remain just around the corner.

Into this uncertain scene comes an important new volume by Matthew Jockers, offering yet another headword (“macroanalysis,” by analogy to macroeconomics) and a range of quantitative studies of 19th-century fiction. Jockers is one of the senior figures in the field, a scholar who has been developing novel ways of digesting large bodies of text for nearly two decades. Despite Jockers’s stature, Macroanalysis is his first book, one that aims to summarize and unify much of his previous research. As such, it covers a lot of ground with varying degrees of technical sophistication. There are chapters devoted to methods as simple as counting the annual number of books published by Irish-American authors and as complex as computational network analysis of literary influence. Aware of this range, Jockers is at pains to draw his material together under the dual headings of literary history and critical method, which is to say that the book aims both to advance a specific argument about the contours of 19th-century literature and to provide a brief in favor of the computational methods that it uses to support such an argument. For some readers, the second half of that pairing — a detailed look into what can be done today with new techniques — will be enough. For others, the book’s success will likely depend on how far they’re persuaded that the literary argument is an important one that can’t be had in the absence of computation…

More practically interesting and ambitious are Jockers’s studies of themes and influence in a larger set of novels from the same period (3,346 of them, to be exact, or about five to 10 percent of those published during the 19th century). These are the only chapters of the book that focus on what we usually understand by the intellectual content of the texts in question, seeking to identify and trace the literary use of meaningful clusters of subject-oriented terms across the corpus. The computational method involved is one known as topic modeling, a statistical approach to identifying such clusters (the topics) in the absence of outside input or training data. What’s exciting about topic modeling is that it can be run quickly over huge swaths of text about which we initially know very little. So instead of developing a hunch about the thematic importance of urban poverty or domestic space or Native Americans in 19th-century fiction and then looking for words that might be associated with those themes — that is, instead of searching Google Books more or less at random on the basis of limited and biased close reading — topic models tell us what groups of words tend to co-occur in statistically improbable ways. These computationally derived word lists are for the most part surprisingly coherent and highly interpretable. Specifically in Jockers’s case, they’re both predictable enough to inspire confidence in the method (there are topics “about” poverty, domesticity, Native Americans, Ireland, sea faring, servants, farming, etc.) and unexpected enough to be worth examining in detail…

The notoriously difficult problem of literary influence finally unites many of the methods in Macroanalysis. The book’s last substantive chapter presents an approach to finding the most central texts among the 3,346 included in the study. To assess the relative influence of any book, Jockers first combines the frequency measures of the roughly 100 most common words used previously for stylistic analysis with the more than 450 topic frequencies used to assess thematic interest. This process generates a broad measure of each book’s position in a very high-dimensional space, allowing him to calculate the “distance” between every pair of books in the corpus. Pairs that are separated by smaller distances are more similar to each other, assuming we’re okay with a definition of similarity that says two books are alike when they use high-frequency words at the same rates and when they consist of equivalent proportions of topic-modeled terms. The most influential books are then the ones — roughly speaking and skipping some mathematical details — that show the shortest average distance to the other texts in the collection. It’s a nifty approach that produces a fascinatingly opaque result: Tristram Shandy, Laurence Sterne’s famously odd 18th-century bildungsroman, is judged to be the most influential member of the collection, followed by George Gissing’s unremarkable The Whirlpool (1897) and Benjamin Disraeli’s decidedly minor romance Venetia (1837). If you can make sense of this result, you’re ahead of Jockers himself, who more or less throws up his hands and ends both the chapter and the analytical portion of the book a paragraph later. It might help if we knew what else of Gissing’s or Disraeli’s was included in the corpus, but that information is provided in neither Macroanalysis nor its online addenda.

Sounds interesting. I wonder if there isn’t a great spot for mixed method analysis: Jockers’ analysis provides the big picture but you also need more intimate and deep knowledge of the smaller groups of texts or individual texts to interpret what the results mean. So, if the data suggests three books are the most influential, you would have to know these books and their context to make sense of what the data says. Additionally, you still want to utilize theories and hypotheses to guide the analysis rather than simply looking for patterns.

This reminds me of the work sociologist Wendy Griswold has done in analyzing whether American novels shared common traits (she argues copyright law was quite influential) or how a reading culture might emerge in a developing nation. Her approach is somewhere between the interpretation of texts and the algorithms described above, relying on more traditional methods in sociology like analyzing samples and conducting interviews.