Home value algorithms show consumers data with outliers, mortgage companies take the outliers out

A homeowner can look online to get an estimate of the value of their home but that number may not match what a lender computes:

Different AVMs are designed to deliver different types of valuations. And therein lies confusion.

Consumers don’t realize that there’s an AVM for nearly any purpose, which explains why different algorithms serve up different results, said Ann Regan, an executive product manager with real estate analytic firm CoreLogic. “The scores presented to consumers are not the same version that is being used by lenders to make decisions,” she said. “The consumer-facing AVMs are designed for consumer marketing purposes.”

For instance, more accurate models used by lenders do not include outliers — properties that sold for extremely high or low prices and that consequently would skew the averages and the comparable sales for a particular house, like yours. But models used by consumer websites, such as brokers’ sites and national listing sites, scoop in as much “sold” data as possible when concocting a valuation, because then they can claim to include all available data. That’s true, said Regan, but it’s more accurate to weed out misleading data.

AVMs used by lenders send along “confidence scores” that indicate how firm the estimate is. That is a factor typically not included alongside consumer AVMs, she added.

This is an interesting trade-off. The assumption is the consumer wants to see that all the data is accounted for, which makes it seem that the estimate is more worthwhile. More data = more accuracy. On the other hand, those that work with data know that measures of central tendency and variability can be thrown off by unusual cases, often known as outliers. If the value of a home is too high or too low, and there are many reasons why this could be the case, the rest of the data can be thrown off. If there are significant outliers, more data does not equal more accuracy.

Since this knowledge is out there (at least printed in a major newspaper), does this mean consumers will be informed of these algorithm features when they look at websites like Zillow? I imagine it could be tricky to easily explain how removing some of the housing comparison data is actually a good thing but if the long-term goal is better numeracy for the public, this could be a good addition to such websites.

Yahoo News leads with Chicago murders and then says it is not the murder capital

If the point of a news story/video is to say something is not true, would you lead with the data from the not true side?

ChicagoMurderCapital121818.png

Here is the way this seems to work: grab your attention with a publicly available statistic that stands out. Oh my, how could there be so many murders in one city?? But, several sentences later, tell the reader/viewer that multiple other cities have a higher murder rate. And include in the last sentence that the murder number in Chicago has been down in recent years. So, wait: Chicago really isn’t the murder capital?

I’m trying to figure out how this adds to the public discourse. Here are a few possibilities:

  1. It is simply about clicks. Get people’s attention with a statistic and a video, throw in some data. Easy to produce, not much content.
  2. The goal is to highlight the still-high number of murders in Chicago.
  3. The goal is to point out that other cities actually experience more murders per capita.
  4. To give those who teach statistics an example of how data can be twisted and/or used without telling much of a story.

 

Bad argument: “I turned out fine”

An Australian parenting expert details why making an “I turned out fine” argument does not work:

It’s what’s known as an anecdotal fallacy. This fallacy, in simple terms, states that “I’m not negatively affected (as far as I can tell), so it must be O.K. for everyone.” As an example: “I wasn’t vaccinated, and I turned out fine. Therefore, vaccination is unnecessary.” We are relying on a sample size of one. Ourselves, or someone we know. And we are applying that result to everyone.

It relies on a decision-making shortcut known as the availability heuristic. Related to the anecdotal fallacy, it’s where we draw on information that is immediately available to us when we make a judgment call. In this case, autobiographical information is easily accessible — it’s already in your head. We were smacked as kids and turned out fine, so smacking doesn’t hurt anyone. But studies show that the availability heuristic is a cognitive bias that can cloud us from making accurate decisions utilizing all the information available. It blinds us to our own prejudices.

It dismisses well-substantiated, scientific evidence. To say “I turned out fine” is an arrogant dismissal of an alternative evidence-based view. It requires no perspective and no engagement with an alternative perspective. The statement closes off discourse and promotes a single perspective that is oblivious to alternatives that may be more enlightened. Anecdotal evidence often undermines scientific results, to our detriment.

It leads to entrenched attitudes. When views inconsistent with our own are shared we make an assumption that whoever holds those views is not fine, refusing to engage, explore or grow. Perhaps an inability to engage with views that run counter to our own suggests that we did not turn out quite so “fine.”

One data point does not make for a broad understanding of how the world works. A single case can illustrate larger trends – but it does not necessarily describe all that happens.

I wonder if one of the issues with the health patterns discussed here is that many people do indeed turn out fine even though there is clear evidence that a certain behavior leads to bad outcomes. Take the example of not wearing a seat belt while riding in the car. Even though more than 30,000 Americans die each year in accidents, the majority of people do not die and most driving goes by without event. Accidents are common but they may not be regular.Many people could indeed say they turned out fine and the thing they experienced still be bad for people.

Speculating on why sociology is less relevant to the media and public than economics

In calling for more sociological insight into economics, a journalist who attended the recent ASA meetings in Philadelphia provides two reasons why sociology lags behind economics in public attention:

Economists, you see, put draft versions of their papers online seemingly as soon as they’ve finished typing. Attend their big annual meeting, as I have several times, and virtually every paper discussed is available beforehand for download and perusal. In fact, they’re available even if you don’t go to the meeting. I wrote a column two years ago arguing that this openness had given economists a big leg up over the other social sciences in media attention and political influence, and noting that a few sociologists agreed and were trying to nudge their discipline — which disseminates its research mainly through paywalled academic journals and university-press books — in that direction with a new open repository for papers called SocArxiv. Now that I’ve experienced the ASA annual meeting for the first time, I can report that (1) things haven’t progressed much since 2016, and (2) I have a bit more sympathy for sociologists’ reticence to act like economists, although I continue to think it’s holding them back.

SocArxiv’s collection of open-access papers is growing steadily if not spectacularly, and Sociological Science, an open-access journal founded in 2014, is carving out a respected role as, among other things, a place to quickly publish articles of public interest. “Unions and Nonunion Pay in the United States, 1977-2015” by Patrick Denice of the University of Western Ontario and Jake Rosenfeld of Washington University in St. Louis, for example, was submitted June 12, accepted July 10 and published on Wednesday, the day after it was presented at the ASA meeting. These dissemination tools are used by only a small minority of sociologists, though, and the most sparsely attended session I attended in three-plus days at their annual meeting was the one on “Open Scholarship in Sociology” organized by the University of Maryland’s Philip Cohen, the founder of SocArxiv and one of the discipline’s most prominent social-media voices. This despite the fact that it was great, featuring compelling presentations by Cohen, Sociological Review deputy editor Kim Weeden of Cornell University and higher-education expert Elizabeth Popp Berman of the State University of New York at Albany, and free SocArxiv pens for all.

As I made the rounds of other sessions, I did come to a better understanding of why sociologists might be more reticent than economists to put their drafts online. The ASA welcomes journalists to its annual meeting and says they can attend all sessions where research is presented, but few reporters show up and it’s clear that most of those presenting research don’t consider themselves to be speaking in public. The most dramatic example of this in Philadelphia came about halfway through a presentation involving a particular corporation. The speaker paused, then asked the 50-plus people in the room not to mention the name of said corporation to anybody because she was about to return to an undercover job there. That was a bit ridiculous, given that there were sociologists live-tweeting some of the sessions. But there was something charming and probably healthy about the willingness of the sociologists at the ASA meeting to discuss still-far-from-complete work with their peers. When a paper is presented at an economics conference, many of the discussant’s comments and audience questions are attempts to poke holes in the reasoning or methodology. At the ASA meeting, it was usually, “This is great. Have you thought about adding …?” Also charming and probably healthy was the high number of graduate students presenting research alongside the professors, which you don’t see so much at the economists’ equivalent gathering.

All in all — and I’m sure there are sociological terms to describe this, but I’m not familiar with them — sociology seems more focused on internal cohesion than economics is. This may be partly because it’s what Popp Berman calls a “low-consensus discipline,” with lots of different methodological approaches and greatly varying standards of quality and rigor. Economists can be mean to each other in public yet still present a semi-united face to the world because they use a widely shared set of tools to arrive at answers. Sociologists may feel that they don’t have that luxury.

Disciplinary differences can be mystifying at times.

I wonder about a third possible difference in addition to the two provided: different conceptions in sociology and economics about what constitutes good arguments and data (hinted at above with the idea of “lots of different methodological approaches and greatly varying standards of quality and rigor.”) Both disciplines do aspire to the idea of social science where empirical data is used to test hypotheses about human behavior, usually in collectives, works. But, this is tricky to do as there are numerous pitfalls along the way. For example, accurate measurement is difficult even when a researcher has clearly identified a concept. Additionally, it is my sense that sociologists as a whole may be more open to qualitative and quantitative data (even with occasional flare-ups between researchers studying the same topic yet falling in different methodological camps). With these methodological questions, sociologists may feel they need more time to connect their methods to a convincing causal and scientific argument

A fourth possible reason behind the differences (also hinted at above with the idea of economists having a “semi-united face” to present): sociology has a reputation as a more left-leaning discipline. Some researchers may prefer to have all their ducks in a row before they expose their work to full public scrutiny. The work of economists is more generally accepted by the public and some leaders while sociology regularly has to work against some backlash. (As an example, see conservative leaders complain about sociology excusing poor behavior when the job of the discipline is to explain human behavior.) Why expose your work to a less welcoming public earlier when you could take a little more time to polish the argument?

Supercommuters up 15.4%, or 0.4 million, between 2005 and 2016

A small and rising number of Americans commute more than ninety minutes a day:

While super commuters still represent a small share of the overall workforce, their long commutes have become increasingly common over the past decade. In 2005, there were about 3.1 million super commuters, roughly 2.4 percent of all commuters. By 2016, that share had increased by 15.9 percent to 2.8 percent of all commuters, or about 4 million workers. In some parts of the country the problem is much worse; in Stockton, where James lives, 10 percent of commuters travel more than 90 minutes to work each day.

The rising number of super commuters underscores a general trend towards longer commutes. The share of commuters traveling 24 minutes or less to work each day has decreased to 55 percent of all commuters in 2016 from 59 percent in 2005. Meanwhile, the share of commuters traveling 25 minutes or more has increased to 45 percent in 2016, compared to 41 percent in 2005. The share of commuters traveling an hour or more to work each day increased 16.1 percent to 9.2 percent in 2016 from 7.9 percent in 2005.

I understand that this article is geared around showing differences in commuting over time. And the data can back that up: supercommuting is up and more Americans have longer commutes.

At the same time, this may be overselling the data:

  1. The changes over 11 years are relatively small. The article talks about percentage changes but the absolute numbers are small. This is the difference between supercommuting is up 15% versus saying it is up 0.4 million.
  2. Given that this data is based on samples of the US population, is a 4% change statistically significant? Is an increase from 2.4 million supercommuters to 2.8 supercommuters substantively significant?
  3. What are the trends between 2005 and 2016? Both of these measurement points are with a more robust economy. Driving was down after the housing bubble burst – was supercommuting affected by this? Is the trend line steady in an upward direction over the last 11 years or is it up and down?

From a broader view, this is not that much change. (There may still be shock value in reminding the public that 2.8% of all commuters are really willing to go far each day.)

Traffic deaths increased in 2016

Explaining the rise in traffic deaths in the last two years may be difficult to explain:

Cars may be safer than ever, but 37,461 people died on American roads that year, a 5.6 percent hike over 2015. While fatalities have dramatically declined in recent decades, this is the second straight year the number has risen. It’s too early to say why, exactly, this is happening. Researchers will need much more time with the data to figure that out. But here’s a hypothesis: It’s the economy, (crash) dummy.

“People drive more in a good economy,” says Chuck Farmer, who oversees research at the Insurance Institute for Highway Safety. “They drive to different places and for different reasons. There’s a difference between going out to a party in the middle of the night in an unfamiliar area and driving to work—that nighttime driving to a party is more risky.”…

Researchers have long known that driving deaths rise and dive with the economy and income growth. People with jobs have more reason to be on the road than the unemployed. But this increase can’t be pinned on the fact of more driving, the stats indicate. Even adjusted for miles traveled, fatalities have ticked up by 2.6 percent over 2015. You can still blame the economy, because people aren’t just driving more. They’re driving differently. Better economic condition give them the flexibility to drive for social reasons. There might be more bar visits (and drinking) and trips along unfamiliar roads (with extra time spent looking at a map on a phone).

The DOT numbers seem to confirm that drivers involved in traffic deaths were doing different things behind the wheel last year. The feds say the number people who died while not wearing seat belts climbed 4.6 percent, and that drunk driving fatalities rose 1.7 percent. Contrary to what you might expect, the numbers show distracted driving deaths dropped slightly, but experts caution against putting too much faith in such info. The numbers are based on police reports. They’re reflections of what cops are seeing at crash sites, but also of what’s in the zeitgeist at the time. It could be that first responders weren’t, for example, looking out for distracted driving last year because it wasn’t in the news as often.

Official statistics do not provide all the information we might want. In this case, the figure of interest to many will simply be the total number of deaths. Is an increase over two years enough to prompt rapid action? If so, I would imagine the regulatory structures regarding driverless cars might attract some attention. Or, do car deaths continue to be the costs we pay for having lifestyles built around driving?

Summarizing data visualization errors

Check out this good quick overview of visualization errors – here are a few good moments:

Everything is relative. You can’t say a town is more dangerous than another because the first one had two robberies and the other only had one. What if the first town has 1,000 times the population that of the first? It is often more useful to think in terms of percentages and rates rather than absolutes and totals…

It’s easy to cherrypick dates and timeframes to fit a specific narrative. So consider history, what usually happens, and proper baselines to compare against…

When you see a three-dimensional chart that is three dimensions for no good reason, question the data, the chart, the maker, and everything based on the chart.

In summary: data visualizations can be very useful for highlighting a particular pattern but they can also be altered to advance an incorrect point. I always wonder with these examples of misleading visualizations whether the maker intentionally made the change to advance their point or whether there was a lack of knowledge about how to do good data analysis. Of course, this issue could arise with any data analysis as there are right and wrong ways to interpret and present data.