It might not matter how wrong Zillow’s price estimates are

When you see a Zestimate on Zillow, how accurate is it?

Photo by RDNE Stock project on Pexels.com

Just how accurate are those numbers, though? Until the house actually trades hands, it’s impossible to say. Zillow’s own explanation of the methodology, and its outcomes, can be misleading. The model, the company says, is based on thousands of data points from public sources like county records, tax documents, and multiple listing services — local databases used by real-estate agents where most homes are advertised for sale. Zillow’s formula also incorporates user-submitted info: If you get a fancy new kitchen, for example, your Zestimate might see a nice bump if you let the company know. Zillow makes sure to note that the Zestimate can’t replace an actual appraisal, but articles on its website also hail the tool as a “powerful starting point in determining a home’s value” and “generally quite accurate.” The median error rate for on-market homes is just 2.4%, per the company’s website, while the median error rate for off-market homes is 7.49%. Not bad, you might think.

But that’s where things get sticky. By definition, half of homes sell within the median error rate, e.g., within 2.4% of the Zestimate in either direction for on-market homes. But the other half don’t, and Zillow doesn’t offer many details on how bad those misses are. And while the Zestimate is appealing because it attempts to measure what a house is worth even when it’s not for sale, it becomes much more accurate when a house actually hits the market. That’s because it’s leaning on actual humans, not computers, to do a lot of the grunt work. When somebody lists their house for sale, the Zestimate will adjust to include all the new seller-provided info: new photos, details on recent renovations, and, most importantly, the list price. The Zestimate keeps adjusting until the house actually sells. At that point, the difference between the sale price and the latest Zestimate is used to calculate the on-market error rate, which, again, is pretty good: In Austin, for instance, a little more than 94% of on-market homes end up selling for within 10% of the last Zestimate before the deal goes through. But Zillow also keeps a second Zestimate humming in the background, one that never sees the light of day. This version doesn’t factor in the list price — it’s carrying on as if the house never went up for sale at all. Instead, it’s used to calculate the “off-market” error rate. When the house sells, the difference between the final price and this shadow algorithm reveals an error rate that’s much less satisfactory: In Austin, only about 66% of these “off-market Zestimates” come within 10% of the actual sale price. In Atlanta, it’s 65%; Chicago, 58%; Nashville, 63%; Seattle, 69%. At today’s median home price of $420,000, a 10% error would mean a difference of more than $40,000.

Without sellers spoonfeeding Zillow the most crucial piece of information — the list price — the Zestimate is hamstrung. It’s a lot easier to estimate what a home will sell for once the sellers broadcast, “Hey, this is the price we’re trying to sell for.” Because the vast majority of sellers work with an agent, the list price is also usually based on that agent’s knowledge of the local market, the finer details of the house, and comparable sales in the area. This September, per Zillow’s own data, the typical home sold for 99.8% of the list price — almost exactly spot on. That may not always be the case, but the list price is generally a good indicator of the sale figure down the line. For a computer model of home prices, it’s basically the prized data point. In the world of AVMs, models that achieve success by fitting their results to list prices are deemed “springy” or “bouncy” — like a ball tethered to a string, they won’t stray too far. Several people I talked to for this story say they’ve seen this in action with Zillow’s model: A seller lists a home and asks for a number significantly different from the Zestimate, and then watches as the Zestimate moves within a respectable distance of that list price anyway. Zillow itself makes no secret of the fact that it leans on the list price to arrive at its own estimate…

So the Zestimate isn’t exactly unique, and it’s far from the best. But to the average internet surfer, no AVM carries the weight, or swagger, of the original. To someone like Jonathan Miller, the president and CEO of the appraisal and consulting company Miller Samuel, the enduring appeal of the Zestimate is maddening. “When you think of the Zestimate, for many, it gives a false anchor for what the value actually is,” Miller says.

Multiple factors are at play here. Who has what information about housing and housing values? How is the value calculated? And what is the distribution of the comparison of the estimated value to the actual sales value? Some of this involves data, some involves algorithms.

It also sounds like part of the story is that Zillow has built one of the more effective brands in this space. Even if the estimates are not exactly right, people are drawn to Zillow. What would happen if competitors advertised that they are more accurate? Would this be enough to move people from using Zillow?

Given all of this, who can build the most accurate number might not be the “winner.” Is the goal to best model the housing market or is the goal to attract users? These two goals might go together but they might not.

Cultural gatekeepers vs. algorithms

Have algorithms rendered cultural critics pointless?

Photo by Yehor Andrukhovych on Pexels.com

Part of the fixation on cultural algorithms is a product of the insecure position in which cultural gatekeepers find themselves. Traditionally, critics have played the dual role of doorman and amplifier, deciding which literature or music or film (to name just a few media) is worthwhile, then augmenting the experience by giving audiences more context. But to a certain extent, they’ve been marginalized by user-driven communities such as BookTok and by AI-generated music playlists that provide recommendations without the complications of critical thinking. Not all that long ago, you might have paged through a music magazine’s reviews or asked a record-store owner for their suggestions; now you just press “Play” on your Spotify daylist, and let the algorithm take the wheel.

If many culture industries struggle to know what will become popular – which single, film, book, show, or product will become wildly successful and make a lot of money? – critics can be one way to try to figure this out. What will the influential critics like? Will they champion particular works (and dislike others)?

Might we get to some point where we see algorithms as critics or acting with judgment and discernment? Right now the recommendation algorithms are a “black box” that users blindly follow. But what if the algorithms “explained” their next step: “You like this song and based on this plus your past choices, I now recommend this.” Or what if you could have a “conversation” back and forth with the algorithm as you explain your interests and it leads in particular directions. Or if the algorithm mimics the idiosyncrasies a human critic would have.

I wonder about the role of friends and social contacts in what they recommend or introduce people to. At their height, could cultural critics move people away from the choices of family and friends around them? In today’s world of recommending algorithms, how often does the patterns of friends and acquaintances move people in different directions?

Rents set by algorithm and how housing prices are set

New tools allow landlords to set rental prices and this has led to lawsuits:

Photo by Meruyert Gonullu on Pexels.com

Instead of getting together with your rivals and agreeing not to compete on price, you can all independently rely on a third party to set your prices for you. Property owners feed RealPage’s “property management software” their data, including unit prices and vacancy rates, and the algorithm—which also knows what competitors are charging—spits out a rent recommendation. If enough landlords use it, the result could look the same as a traditional price-fixing cartel: lockstep price increases instead of price competition, no secret handshake or clandestine meeting needed…

According to the lawsuits, RealPage’s clients act more like collaborators than competitors. Landlords hand over highly confidential information to RealPage, and many of them recruit their rivals to use the service. “Those kinds of behaviors raise a big red flag,” Maurice Stucke, a law professor at the University of Tennessee and a former antitrust attorney at the Department of Justice, told me. When companies are operating in a highly competitive market, he said, they typically go to great lengths to protect any sensitive information that could give their rivals an edge.

The lawsuits also argue that RealPage pressures landlords to comply with its pricing suggestions—something that would make no sense if the company were merely being paid to offer individualized advice. In an interview with ProPublica, Jeffrey Roper, who helped develop one of RealPage’s main software tools, acknowledged that one of the greatest threats to a landlord’s profits is when nearby properties set prices too low. “If you have idiots undervaluing, it costs the whole system,” he said. RealPage thus makes it hard for customers to override its recommendations, according to the lawsuits, allegedly even requiring a written justification and explicit approval from RealPage staff. Former employees have said that failure to comply with the company’s recommendations could result in clients being kicked off the service. “This, to me, is the biggest giveaway,” Lee Hepner, an antitrust lawyer at the American Economic Liberties Project, an anti-monopoly organization, told me. “Enforced compliance is the hallmark feature of any cartel.”

The company disputes this description, claiming that it simply offers “bespoke pricing recommendations” and lacks “any power” to set prices. “RealPage customers make their own pricing decisions, and acceptance rates of RealPage’s pricing recommendations have been greatly exaggerated,” the company says.

It will be interesting to see how the courts decide in this area.

I would be curious to hear how this process differs from the way housing prices are determined. The “correct price” does not just emerge. There are a set of actors – such as realtors, appraisers, and websites – that contribute. There are local histories that inform current and future prices. The housing market follows particular patterns and I recommend reading sociologist Elizabeth Korver-Glenn’s 2021 book Race Brokers: Housing Markets and Segregation in 21st Century Urban America on this topic.

Is the primary difference that there is not a centralized tech source for housing prices? (But maybe there is – how much has Zillow and its Zestimate changed the game?) Or are the new actors viewed with more suspicion than others (tech sector versus realtors)? Or are we in a particular social moment where high costs of housing prompt more questions and thoughts about alternative?

Audio algorithms and how we watch (and read) TV

More people use subtitles with TV shows because algorithms for audio have changed:

Photo by Nothing Ahead on Pexels.com

Specifically, it has everything to do with LKFS, which stands for “Loudness, K-weighted, relative to full scale” and which, for the sake of simplicity, is a unit for measuring loudness. Traditionally it’s been anchored to the dialogue. For years, going back to the golden age of broadcast television and into the pay-cable era, audio engineers had to deliver sound levels within an industry-standard LKFS, or their work would get kicked back to them. That all changed when streaming companies seized control of the industry, a period of time that rather neatly matches Game of Thrones’ run on HBO. According to Blank, Game of Thrones sounded fantastic for years, and she’s got the Emmys to prove it. Then, in 2018, just prior to the show’s final season, AT&T bought HBO’s parent company and overlaid its own uniform loudness spec, which was flatter and simpler to scale across a large library of content. But it was also, crucially, un-anchored to the dialogue.

“So instead of this algorithm analyzing the loudness of the dialogue coming out of people’s mouths,” Blank explained to me, “it analyzes the whole show as loudness. So if you have a loud music cue, that’s gonna be your loud point. And then, when the dialogue comes, you can’t hear it.” Blank remembers noticing the difference from the moment AT&T took the reins at Time Warner; overnight, she said, HBO’s sound went from best-in-class to worst. During the last season of Game of Thrones, she said, “we had to beg [AT&T] to keep our old spec every single time we delivered an episode.” (Because AT&T spun off HBO’s parent company in 2022, a spokesperson for AT&T said they weren’t able to comment on the matter.)

Netflix still uses a dialogue-anchor spec, she said, which is why shows on Netflix sound (to her) noticeably crisper and clearer: “If you watch a Netflix show now and then immediately you turn on an HBO show, you’re gonna have to raise your volume.” Amazon Prime Video’s spec, meanwhile, “is pretty gnarly.” But what really galls her about Amazon is its new “dialogue boost” function, which viewers can select to “increase the volume of dialogue relative to background music and effects.” In other words, she said, it purports to fix a problem of Amazon’s own creation. Instead, she suggested, “why don’t you just air it the way we mixed it?”

This change in how television audio works contributes to needing subtitles to understand what is being said.

I wonder if the bigger question is whether this significantly changes how people consume and are affected by television. If we are reading more dialogue and descriptions, does this focus our attention on certain aspects of shows and not others? Could this be good for reading overall? Does it limit the ability of viewers to multitask if they need to keep up with the words on the screen? Do subtitles help engage the attention of viewers? Do I understand new things I did notice before in the world with fewer subtitles? Does a story or scene stick with me longer because I was reading the dialogue?

Does this also mean that as Americans have been able to buy bigger and bigger TVs for cheaper prices, they are getting a worse audio experience?

Zillow sought pricing predictability in the supposedly predictable market of Phoenix

With Zillow stopping its iBuyer initiative, here are more details about how the Phoenix housing market was key to the plan:

Photo by RODNAE Productions on Pexels.com

Tech firms chose the Phoenix area because of its preponderance of cookie-cutter homes. Unlike Boston or New York, the identikit streets make pricing properties easier. iBuyers’ market share in Phoenix grew from around 1 percent in 2015—when tech companies first entered the market—to 6 percent in 2018, says Tomasz Piskorski of Columbia Business School, who is also a member of the National Bureau of Economic Research. Piskorski believes iBuyers—Zillow included—have grown their share since, but are still involved in less than 10 percent of all transactions in the city…

Barton told analysts that the premise of Zillow’s iBuying business was being able to forecast the price of homes accurately three to six months in advance. That reflected the time to fix and sell homes Zillow had bought…

In Phoenix, the problem was particularly acute. Nine in 10 homes Zillow bought were put up for sale at a lower price than the company originally bought them, according to an October 2021 analysis by Insider. If each of those homes sold for Zillow’s asking price, the company would lose $6.3 million. “Put simply, our observed error rate has been far more volatile than we ever expected possible,” Barton admitted. “And makes us look far more like a leveraged housing trader than the market maker we set out to be.”…

To make the iBuying program profitable, however, Zillow believed its estimates had to be more precise, within just a few thousand dollars. Throw in the changes brought in by the pandemic, and the iBuying program was losing money. One such factor: In Phoenix and elsewhere, a shortage of contractors made it hard for Zillow to flip its homes as quickly as it hoped.

It sounds like the rapid sprawling growth of Phoenix in recent decades made it attractive for trying to estimate and predict prices. The story above highlights cookie-cutter subdivisions and homes – they are newer and similar to each other – and I imagine this is helpful for models compared to older cities where there is more variation within and across neighborhoods. Take that critics of suburban ticky-tacky houses and conformity!

But, when conditions change – COVID-19 hits which then changes the behavior of buyers and sellers, contractors and the building trades, and other actors in the housing industry – that uniformity in housing was not enough to easily profit.

As the end of the article suggests, the algorithms could be changed or improved and other institutional buyers are also interested. Is this just a matter of having more data and/or better modeling? Could it all work for these companies outside of really unusual times? Or, perhaps there really are US or housing markets around the globe that are more predictable than others?

If suburban areas and communities are the places where this really takes off, the historical patterns of people making money off what are often regarded as havens for families and the American Dream may continue. Sure, homeowners may profit as their housing values increase over time but the bigger actors including developers, lenders, and real estate tech companies may be the ones who really benefit.

Claim: Facebook wants to curate the news through an algorithm

Insiders have revealed how Facebook is selecting its trending news stories:

Launched in January 2014, Facebook’s trending news section occupies some of the most precious real estate in all of the internet, filling the top-right hand corner of the site with a list of topics people are talking about and links out to different news articles about them. The dozen or so journalists paid to run that section are contractors who work out of the basement of the company’s New York office…

The trending news section is run by people in their 20s and early 30s, most of whom graduated from Ivy League and private East Coast schools like Columbia University and NYU. They’ve previously worked at outlets like the New York Daily News, Bloomberg, MSNBC, and the Guardian. Some former curators have left Facebook for jobs at organizations including the New Yorker, Mashable, and Sky Sports.

According to former team members interviewed by Gizmodo, this small group has the power to choose what stories make it onto the trending bar and, more importantly, what news sites each topic links out to. “We choose what’s trending,” said one. “There was no real standard for measuring what qualified as news and what didn’t. It was up to the news curator to decide.”…

That said, many former employees suspect that Facebook’s eventual goal is to replace its human curators with a robotic one. The former curators Gizmodo interviewed started to feel like they were training a machine, one that would eventually take their jobs. Managers began referring to a “more streamlined process” in meetings. As one former contractor put it: “We felt like we were part of an experiment that, as the algorithm got better, there was a sense that at some point the humans would be replaced.”

The angle here seems to be that (1) the journalists who participated did not feel they were treated well and (2) journalists may not be part of the future process because an algorithm will take over. I don’t know about the first but is the second a major surprise? The trending news will still require content to be generated, presumably created by journalists and news sources all across the Internet. Do journalists want to retain the privilege to not just write the news but also to choose what gets reported? In other words, the gatekeeper role of journalism may slowly disappear if algorithms guide what people see.

Imagine the news algorithms that people might have available to them in the future: one that doesn’t report any violent crime (it is overreported anyway); one that only includes celebrity news (this might include politics, it might not); one that reports on all forms of government corruption; and so on. I’m guessing, however, Facebook’s algorithm would be proprietary and probably is trying to push people into certain behaviors (whether that is sharing more on their profiles or pursuing particular civic or political actions).

Zillow off a median of 8% on home prices; is this a big problem?

Zillow’s CEO recently discussed the error rate of his company’s estimates for home values:

Back to the question posed by O’Donnell: Are Zestimates accurate? And if they’re off the mark, how far off? Zillow CEO Spencer Rascoff answered that they’re “a good starting point” but that nationwide Zestimates have a “median error rate” of about 8%.

Whoa. That sounds high. On a $500,000 house, that would be a $40,000 disparity — a lot of money on the table — and could create problems. But here’s something Rascoff was not asked about: Localized median error rates on Zestimates sometimes far exceed the national median, which raises the odds that sellers and buyers will have conflicts over pricing. Though it’s not prominently featured on the website, at the bottom of Zillow’s home page in small type is the word “Zestimates.” This section provides helpful background information along with valuation error rates by state and county — some of which are stunners.

For example, in New York County — Manhattan — the median valuation error rate is 19.9%. In Brooklyn, it’s 12.9%. In Somerset County, Md., the rate is an astounding 42%. In some rural counties in California, error rates range as high as 26%. In San Francisco it’s 11.6%. With a median home value of $1,000,800 in San Francisco, according to Zillow estimates as of December, a median error rate at this level translates into a price disparity of $116,093.

Thinking from a probabilistic perspective, 8% does not sound bad at all. Consider that the typical scientific study works with a 5% error rate. An eight percent error rate suggests the estimate is right 92% of the time. As the article notes, this error rates differs across regions but each of those have different conditions including more or less sales and different kinds of housing. Thus, in dynamic real estate markets with lots of moving parts including comparables as well as the actions of homeowners and homebuyers, 8% sounds good.

Perhaps the bigger issue is what people do with estimates; they are not 100% guarantees:

So what do you do now that you’ve got the scoop on Zestimate accuracy? Most important, take Rascoff’s advice: Look at them as no more than starting points in pricing discussions with the real authorities on local real estate values — experienced agents and appraisers. Zestimates are hardly gospel — often far from it.

Zillow can be a useful tool but it is based on algorithms using available data.

Facebook as the new gatekeeper of journalism

Facebook’s algorithms now go a long way in dictating what news users see:

“We try to explicitly view ourselves as not editors,” he said. “We don’t want to have editorial judgment over the content that’s in your feed. You’ve made your friends, you’ve connected to the pages that you want to connect to and you’re the best decider for the things that you care about.”…

Roughly once a week, he and his team of about 16 adjust the complex computer code that decides what to show a user when he or she first logs on to Facebook. The code is based on “thousands and thousands” of metrics, Mr. Marra said, including what device a user is on, how many comments or likes a story has received and how long readers spend on an article…

If Facebook’s algorithm smiles on a publisher, the rewards, in terms of traffic, can be enormous. If Mr. Marra and his team decide that users do not enjoy certain things, such as teaser headlines that lure readers to click through to get all the information, it can mean ruin. When Facebook made changes to its algorithm in December 2013 to emphasize higher-quality content, several so-called viral sites that had thrived there, including Upworthy, Distractify and Elite Daily, saw large declines in their traffic.

Facebook executives frame the company’s relationship with publishers as mutually beneficial: when publishers promote their content on Facebook, its users have more engaging material to read, and the publishers get increased traffic driven to their sites. Numerous publications, including The New York Times, have met with Facebook officials to discuss how to improve their referral traffic.

Is Facebook a better gatekeeper than news outlets, editors, and the large corporations that often run them? I see three key differences:

1. Facebook’s methods are based on social networks and what your friends and others in your feed like. This may be not too much different than checking sites yourselves – especially since people often go to the same sites or go to ones that end to agree with them – but the results are out of your hands.

2. Ultimately, Facebook wants to connect you to other people using news, not necessarily give you news for other purposes like being an informed citizen or spurring you to action. This is a different process than seeking out news sites that primarily produce news (even if that is now often a lot of celebrity or entertainment info).

3. The news is interspersed with new pieces of information about the lives of others. This likely catches people’s attention and doesn’t provide an overwhelming amount of news or information that is abstracted from the user/reader.

Using social media data to predict traits about users

Here is a summary of research that uses algorithms and “concepts from psychology and sociology” to uncover traits of social media users through what they make available:

One study in this space, published in 2013 by researchers at the University of Cambridge and their colleagues, gathered data from 60,000 Facebook users and, with their Facebook “likes” alone, predicted a wide range of personal traits. The researchers could predict attributes like a person’s gender, religion, sexual orientation, and substance use (drugs, alcohol, smoking)…

How could liking curly fries be predictive? The reasoning relies on a few insights from sociology. Imagine one of the first people to like the page happened to be smart. Once she liked it, her friends saw it. A social science concept called homophily tells us that people tend to be friends with people like themselves. Smart people tend to be friends with smart people. Liberals are friends with other liberals. Rich people hang out with other rich people…

On the first site, YouAreWhatYouLike, the algorithms will tell you about your personality. This includes openness to new ideas, extraversion and introversion, your emotional stability, your warmth or competitiveness, and your organizational levels.

The second site, Apply Magic Sauce, predicts your politics, relationship status, sexual orientation, gender, and more. You can try it on yourself, but be forewarned that the data is in a machine-readable format. You’ll be able to figure it out, but it’s not as pretty as YouAreWhatYouLike.

These aren’t the only tools that do this. AnalyzeWords leverages linguistics to discover the personality you portray on Twitter. It does not look at the topics you discuss in your tweets, but rather at things like how often you say “I” vs. “we,” how frequently you curse, and how many anxiety-related words you use. The interesting thing about this tool is that you can analyze anyone, not just yourself.

The author then goes on to say that she purges her social media accounts to not include much old content so third parties can’t use the information against them. That is one response. However, before I go do this, I would want to know a few things:

1. Just how good are these predictions? It is one thing to suggest they are 60% accurate but another to say they are 90% accurate.

2. How much data do these algorithms need to make good predictions?

3. How are social media companies responding to such moves? While I’m sure they are doing some of this themselves, what are they planning to do if someone wants to use this data in a harmful way (say, affecting people’s credit score)? Why not set limits for this now rather than after the fact?

Analyzing Netflix’s thousands of movie genres

Alexis Madrigal decided to look into the movie genres of Netflix – and found lots of interesting data:

As the hours ticked by, the Netflix grammar—how it pieced together the words to form comprehensible genres—began to become apparent as well.

If a movie was both romantic and Oscar-winning, Oscar-winning always went to the left: Oscar-winning Romantic Dramas. Time periods always went at the end of the genre: Oscar-winning Romantic Dramas from the 1950s

In fact, there was a hierarchy for each category of descriptor. Generally speaking, a genre would be formed out of a subset of these components:

Region + Adjectives + Noun Genre + Based On… + Set In… + From the… + About… + For Age X to Y

Yellin said that the genres were limited by three main factors: 1) they only want to display 50 characters for various UI reasons, which eliminates most long genres; 2) there had to be a “critical mass” of content that fit the description of the genre, at least in Netflix’s extended DVD catalog; and 3) they only wanted genres that made syntactic sense.

And the conclusion is that there are so many genres that they don’t necessarily make sense to humans. This strikes me as a uniquely modern problem: we know how to find patterns via algorithm and then we have to decide whether we want to know why the patterns exist. We might call this the Freakonomics problem: we can collect reams of data, data mine it, and then have to develop explanations. This, of course, is the reverse of the typical scientific process that starts with theories and then goes about testing them. The Netflix “reverse engineering” can be quite useful but wouldn’t it be nice to know why Perry Mason and a few other less celebrated actors show up so often?

At the least, I bet Hollywood would like access to such explanations. This also reminds me of the Music Genome Project that underlies Pandora. Unlock the genres and there is money to be made.