Zillow sought pricing predictability in the supposedly predictable market of Phoenix

With Zillow stopping its iBuyer initiative, here are more details about how the Phoenix housing market was key to the plan:

Photo by RODNAE Productions on Pexels.com

Tech firms chose the Phoenix area because of its preponderance of cookie-cutter homes. Unlike Boston or New York, the identikit streets make pricing properties easier. iBuyers’ market share in Phoenix grew from around 1 percent in 2015—when tech companies first entered the market—to 6 percent in 2018, says Tomasz Piskorski of Columbia Business School, who is also a member of the National Bureau of Economic Research. Piskorski believes iBuyers—Zillow included—have grown their share since, but are still involved in less than 10 percent of all transactions in the city…

Barton told analysts that the premise of Zillow’s iBuying business was being able to forecast the price of homes accurately three to six months in advance. That reflected the time to fix and sell homes Zillow had bought…

In Phoenix, the problem was particularly acute. Nine in 10 homes Zillow bought were put up for sale at a lower price than the company originally bought them, according to an October 2021 analysis by Insider. If each of those homes sold for Zillow’s asking price, the company would lose $6.3 million. “Put simply, our observed error rate has been far more volatile than we ever expected possible,” Barton admitted. “And makes us look far more like a leveraged housing trader than the market maker we set out to be.”…

To make the iBuying program profitable, however, Zillow believed its estimates had to be more precise, within just a few thousand dollars. Throw in the changes brought in by the pandemic, and the iBuying program was losing money. One such factor: In Phoenix and elsewhere, a shortage of contractors made it hard for Zillow to flip its homes as quickly as it hoped.

It sounds like the rapid sprawling growth of Phoenix in recent decades made it attractive for trying to estimate and predict prices. The story above highlights cookie-cutter subdivisions and homes – they are newer and similar to each other – and I imagine this is helpful for models compared to older cities where there is more variation within and across neighborhoods. Take that critics of suburban ticky-tacky houses and conformity!

But, when conditions change – COVID-19 hits which then changes the behavior of buyers and sellers, contractors and the building trades, and other actors in the housing industry – that uniformity in housing was not enough to easily profit.

As the end of the article suggests, the algorithms could be changed or improved and other institutional buyers are also interested. Is this just a matter of having more data and/or better modeling? Could it all work for these companies outside of really unusual times? Or, perhaps there really are US or housing markets around the globe that are more predictable than others?

If suburban areas and communities are the places where this really takes off, the historical patterns of people making money off what are often regarded as havens for families and the American Dream may continue. Sure, homeowners may profit as their housing values increase over time but the bigger actors including developers, lenders, and real estate tech companies may be the ones who really benefit.

Claim: Facebook wants to curate the news through an algorithm

Insiders have revealed how Facebook is selecting its trending news stories:

Launched in January 2014, Facebook’s trending news section occupies some of the most precious real estate in all of the internet, filling the top-right hand corner of the site with a list of topics people are talking about and links out to different news articles about them. The dozen or so journalists paid to run that section are contractors who work out of the basement of the company’s New York office…

The trending news section is run by people in their 20s and early 30s, most of whom graduated from Ivy League and private East Coast schools like Columbia University and NYU. They’ve previously worked at outlets like the New York Daily News, Bloomberg, MSNBC, and the Guardian. Some former curators have left Facebook for jobs at organizations including the New Yorker, Mashable, and Sky Sports.

According to former team members interviewed by Gizmodo, this small group has the power to choose what stories make it onto the trending bar and, more importantly, what news sites each topic links out to. “We choose what’s trending,” said one. “There was no real standard for measuring what qualified as news and what didn’t. It was up to the news curator to decide.”…

That said, many former employees suspect that Facebook’s eventual goal is to replace its human curators with a robotic one. The former curators Gizmodo interviewed started to feel like they were training a machine, one that would eventually take their jobs. Managers began referring to a “more streamlined process” in meetings. As one former contractor put it: “We felt like we were part of an experiment that, as the algorithm got better, there was a sense that at some point the humans would be replaced.”

The angle here seems to be that (1) the journalists who participated did not feel they were treated well and (2) journalists may not be part of the future process because an algorithm will take over. I don’t know about the first but is the second a major surprise? The trending news will still require content to be generated, presumably created by journalists and news sources all across the Internet. Do journalists want to retain the privilege to not just write the news but also to choose what gets reported? In other words, the gatekeeper role of journalism may slowly disappear if algorithms guide what people see.

Imagine the news algorithms that people might have available to them in the future: one that doesn’t report any violent crime (it is overreported anyway); one that only includes celebrity news (this might include politics, it might not); one that reports on all forms of government corruption; and so on. I’m guessing, however, Facebook’s algorithm would be proprietary and probably is trying to push people into certain behaviors (whether that is sharing more on their profiles or pursuing particular civic or political actions).

Zillow off a median of 8% on home prices; is this a big problem?

Zillow’s CEO recently discussed the error rate of his company’s estimates for home values:

Back to the question posed by O’Donnell: Are Zestimates accurate? And if they’re off the mark, how far off? Zillow CEO Spencer Rascoff answered that they’re “a good starting point” but that nationwide Zestimates have a “median error rate” of about 8%.

Whoa. That sounds high. On a $500,000 house, that would be a $40,000 disparity — a lot of money on the table — and could create problems. But here’s something Rascoff was not asked about: Localized median error rates on Zestimates sometimes far exceed the national median, which raises the odds that sellers and buyers will have conflicts over pricing. Though it’s not prominently featured on the website, at the bottom of Zillow’s home page in small type is the word “Zestimates.” This section provides helpful background information along with valuation error rates by state and county — some of which are stunners.

For example, in New York County — Manhattan — the median valuation error rate is 19.9%. In Brooklyn, it’s 12.9%. In Somerset County, Md., the rate is an astounding 42%. In some rural counties in California, error rates range as high as 26%. In San Francisco it’s 11.6%. With a median home value of $1,000,800 in San Francisco, according to Zillow estimates as of December, a median error rate at this level translates into a price disparity of $116,093.

Thinking from a probabilistic perspective, 8% does not sound bad at all. Consider that the typical scientific study works with a 5% error rate. An eight percent error rate suggests the estimate is right 92% of the time. As the article notes, this error rates differs across regions but each of those have different conditions including more or less sales and different kinds of housing. Thus, in dynamic real estate markets with lots of moving parts including comparables as well as the actions of homeowners and homebuyers, 8% sounds good.

Perhaps the bigger issue is what people do with estimates; they are not 100% guarantees:

So what do you do now that you’ve got the scoop on Zestimate accuracy? Most important, take Rascoff’s advice: Look at them as no more than starting points in pricing discussions with the real authorities on local real estate values — experienced agents and appraisers. Zestimates are hardly gospel — often far from it.

Zillow can be a useful tool but it is based on algorithms using available data.

Facebook as the new gatekeeper of journalism

Facebook’s algorithms now go a long way in dictating what news users see:

“We try to explicitly view ourselves as not editors,” he said. “We don’t want to have editorial judgment over the content that’s in your feed. You’ve made your friends, you’ve connected to the pages that you want to connect to and you’re the best decider for the things that you care about.”…

Roughly once a week, he and his team of about 16 adjust the complex computer code that decides what to show a user when he or she first logs on to Facebook. The code is based on “thousands and thousands” of metrics, Mr. Marra said, including what device a user is on, how many comments or likes a story has received and how long readers spend on an article…

If Facebook’s algorithm smiles on a publisher, the rewards, in terms of traffic, can be enormous. If Mr. Marra and his team decide that users do not enjoy certain things, such as teaser headlines that lure readers to click through to get all the information, it can mean ruin. When Facebook made changes to its algorithm in December 2013 to emphasize higher-quality content, several so-called viral sites that had thrived there, including Upworthy, Distractify and Elite Daily, saw large declines in their traffic.

Facebook executives frame the company’s relationship with publishers as mutually beneficial: when publishers promote their content on Facebook, its users have more engaging material to read, and the publishers get increased traffic driven to their sites. Numerous publications, including The New York Times, have met with Facebook officials to discuss how to improve their referral traffic.

Is Facebook a better gatekeeper than news outlets, editors, and the large corporations that often run them? I see three key differences:

1. Facebook’s methods are based on social networks and what your friends and others in your feed like. This may be not too much different than checking sites yourselves – especially since people often go to the same sites or go to ones that end to agree with them – but the results are out of your hands.

2. Ultimately, Facebook wants to connect you to other people using news, not necessarily give you news for other purposes like being an informed citizen or spurring you to action. This is a different process than seeking out news sites that primarily produce news (even if that is now often a lot of celebrity or entertainment info).

3. The news is interspersed with new pieces of information about the lives of others. This likely catches people’s attention and doesn’t provide an overwhelming amount of news or information that is abstracted from the user/reader.

Using social media data to predict traits about users

Here is a summary of research that uses algorithms and “concepts from psychology and sociology” to uncover traits of social media users through what they make available:

One study in this space, published in 2013 by researchers at the University of Cambridge and their colleagues, gathered data from 60,000 Facebook users and, with their Facebook “likes” alone, predicted a wide range of personal traits. The researchers could predict attributes like a person’s gender, religion, sexual orientation, and substance use (drugs, alcohol, smoking)…

How could liking curly fries be predictive? The reasoning relies on a few insights from sociology. Imagine one of the first people to like the page happened to be smart. Once she liked it, her friends saw it. A social science concept called homophily tells us that people tend to be friends with people like themselves. Smart people tend to be friends with smart people. Liberals are friends with other liberals. Rich people hang out with other rich people…

On the first site, YouAreWhatYouLike, the algorithms will tell you about your personality. This includes openness to new ideas, extraversion and introversion, your emotional stability, your warmth or competitiveness, and your organizational levels.

The second site, Apply Magic Sauce, predicts your politics, relationship status, sexual orientation, gender, and more. You can try it on yourself, but be forewarned that the data is in a machine-readable format. You’ll be able to figure it out, but it’s not as pretty as YouAreWhatYouLike.

These aren’t the only tools that do this. AnalyzeWords leverages linguistics to discover the personality you portray on Twitter. It does not look at the topics you discuss in your tweets, but rather at things like how often you say “I” vs. “we,” how frequently you curse, and how many anxiety-related words you use. The interesting thing about this tool is that you can analyze anyone, not just yourself.

The author then goes on to say that she purges her social media accounts to not include much old content so third parties can’t use the information against them. That is one response. However, before I go do this, I would want to know a few things:

1. Just how good are these predictions? It is one thing to suggest they are 60% accurate but another to say they are 90% accurate.

2. How much data do these algorithms need to make good predictions?

3. How are social media companies responding to such moves? While I’m sure they are doing some of this themselves, what are they planning to do if someone wants to use this data in a harmful way (say, affecting people’s credit score)? Why not set limits for this now rather than after the fact?

Analyzing Netflix’s thousands of movie genres

Alexis Madrigal decided to look into the movie genres of Netflix – and found lots of interesting data:

As the hours ticked by, the Netflix grammar—how it pieced together the words to form comprehensible genres—began to become apparent as well.

If a movie was both romantic and Oscar-winning, Oscar-winning always went to the left: Oscar-winning Romantic Dramas. Time periods always went at the end of the genre: Oscar-winning Romantic Dramas from the 1950s

In fact, there was a hierarchy for each category of descriptor. Generally speaking, a genre would be formed out of a subset of these components:

Region + Adjectives + Noun Genre + Based On… + Set In… + From the… + About… + For Age X to Y

Yellin said that the genres were limited by three main factors: 1) they only want to display 50 characters for various UI reasons, which eliminates most long genres; 2) there had to be a “critical mass” of content that fit the description of the genre, at least in Netflix’s extended DVD catalog; and 3) they only wanted genres that made syntactic sense.

And the conclusion is that there are so many genres that they don’t necessarily make sense to humans. This strikes me as a uniquely modern problem: we know how to find patterns via algorithm and then we have to decide whether we want to know why the patterns exist. We might call this the Freakonomics problem: we can collect reams of data, data mine it, and then have to develop explanations. This, of course, is the reverse of the typical scientific process that starts with theories and then goes about testing them. The Netflix “reverse engineering” can be quite useful but wouldn’t it be nice to know why Perry Mason and a few other less celebrated actors show up so often?

At the least, I bet Hollywood would like access to such explanations. This also reminds me of the Music Genome Project that underlies Pandora. Unlock the genres and there is money to be made.

Using algorithms to analyze the literary canon

A new book describes efforts to use algorithms to discover what is in and out of the literary canon:

There’s no single term that captures the range of new, large-scale work currently underway in the literary academy, and that’s probably as it should be. More than a decade ago, the Stanford scholar of world literature Franco Moretti dubbed his quantitative approach to capturing the features and trends of global literary production “distant reading,” a practice that paid particular attention to counting books themselves and owed much to bibliographic and book historical methods. In earlier decades, so-called “humanities computing” joined practitioners of stylometry and authorship attribution, who attempted to quantify the low-level differences between individual texts and writers. More recently, the catchall term “digital humanities” has been used to describe everything from online publishing and new media theory to statistical genre discrimination. In each of these cases, however, the shared recognition — like the impulse behind the earlier turn to cultural theory, albeit with a distinctly quantitative emphasis — has been that there are big gains to be had from looking at literature first as an interlinked, expressive system rather than as something that individual books do well, badly, or typically. At the same time, the gains themselves have as yet been thin on the ground, as much suggestions of future progress as transformative results in their own right. Skeptics could be forgiven for wondering how long the data-driven revolution can remain just around the corner.

Into this uncertain scene comes an important new volume by Matthew Jockers, offering yet another headword (“macroanalysis,” by analogy to macroeconomics) and a range of quantitative studies of 19th-century fiction. Jockers is one of the senior figures in the field, a scholar who has been developing novel ways of digesting large bodies of text for nearly two decades. Despite Jockers’s stature, Macroanalysis is his first book, one that aims to summarize and unify much of his previous research. As such, it covers a lot of ground with varying degrees of technical sophistication. There are chapters devoted to methods as simple as counting the annual number of books published by Irish-American authors and as complex as computational network analysis of literary influence. Aware of this range, Jockers is at pains to draw his material together under the dual headings of literary history and critical method, which is to say that the book aims both to advance a specific argument about the contours of 19th-century literature and to provide a brief in favor of the computational methods that it uses to support such an argument. For some readers, the second half of that pairing — a detailed look into what can be done today with new techniques — will be enough. For others, the book’s success will likely depend on how far they’re persuaded that the literary argument is an important one that can’t be had in the absence of computation…

More practically interesting and ambitious are Jockers’s studies of themes and influence in a larger set of novels from the same period (3,346 of them, to be exact, or about five to 10 percent of those published during the 19th century). These are the only chapters of the book that focus on what we usually understand by the intellectual content of the texts in question, seeking to identify and trace the literary use of meaningful clusters of subject-oriented terms across the corpus. The computational method involved is one known as topic modeling, a statistical approach to identifying such clusters (the topics) in the absence of outside input or training data. What’s exciting about topic modeling is that it can be run quickly over huge swaths of text about which we initially know very little. So instead of developing a hunch about the thematic importance of urban poverty or domestic space or Native Americans in 19th-century fiction and then looking for words that might be associated with those themes — that is, instead of searching Google Books more or less at random on the basis of limited and biased close reading — topic models tell us what groups of words tend to co-occur in statistically improbable ways. These computationally derived word lists are for the most part surprisingly coherent and highly interpretable. Specifically in Jockers’s case, they’re both predictable enough to inspire confidence in the method (there are topics “about” poverty, domesticity, Native Americans, Ireland, sea faring, servants, farming, etc.) and unexpected enough to be worth examining in detail…

The notoriously difficult problem of literary influence finally unites many of the methods in Macroanalysis. The book’s last substantive chapter presents an approach to finding the most central texts among the 3,346 included in the study. To assess the relative influence of any book, Jockers first combines the frequency measures of the roughly 100 most common words used previously for stylistic analysis with the more than 450 topic frequencies used to assess thematic interest. This process generates a broad measure of each book’s position in a very high-dimensional space, allowing him to calculate the “distance” between every pair of books in the corpus. Pairs that are separated by smaller distances are more similar to each other, assuming we’re okay with a definition of similarity that says two books are alike when they use high-frequency words at the same rates and when they consist of equivalent proportions of topic-modeled terms. The most influential books are then the ones — roughly speaking and skipping some mathematical details — that show the shortest average distance to the other texts in the collection. It’s a nifty approach that produces a fascinatingly opaque result: Tristram Shandy, Laurence Sterne’s famously odd 18th-century bildungsroman, is judged to be the most influential member of the collection, followed by George Gissing’s unremarkable The Whirlpool (1897) and Benjamin Disraeli’s decidedly minor romance Venetia (1837). If you can make sense of this result, you’re ahead of Jockers himself, who more or less throws up his hands and ends both the chapter and the analytical portion of the book a paragraph later. It might help if we knew what else of Gissing’s or Disraeli’s was included in the corpus, but that information is provided in neither Macroanalysis nor its online addenda.

Sounds interesting. I wonder if there isn’t a great spot for mixed method analysis: Jockers’ analysis provides the big picture but you also need more intimate and deep knowledge of the smaller groups of texts or individual texts to interpret what the results mean. So, if the data suggests three books are the most influential, you would have to know these books and their context to make sense of what the data says. Additionally, you still want to utilize theories and hypotheses to guide the analysis rather than simply looking for patterns.

This reminds me of the work sociologist Wendy Griswold has done in analyzing whether American novels shared common traits (she argues copyright law was quite influential) or how a reading culture might emerge in a developing nation. Her approach is somewhere between the interpretation of texts and the algorithms described above, relying on more traditional methods in sociology like analyzing samples and conducting interviews.

Using algorithms to judge cultural works

Imagine the money that could be made or the status acquired if algorithms could correctly predict the merit of cultural works:

The budget for the film was $180m and, Meaney says, “it was breathtaking that it was under serious consideration”. There were dinosaurs and tigers. It existed in a fantasy prehistory—with a fantasy language. “Preposterous things were happening, without rhyme or reason.” Meaney, who will not reveal the film’s title because he “can’t afford to piss these people off”, told the studio that his program concurred with his own view: it was a stinker.

The difference is the program puts a value on it. Technically a neural network, with a structure modelled on that of our brain, it gradually learns from experience and then applies what it has learnt to new situations. Using this analysis, and comparing it with data on 12 years of American box-office takings, it predicted that the film in question would make $30m. With changes, Meaney reckoned they could increase the take—but not to $180m. On the day the studio rejected the film, another one took it up. They made some changes, but not enough—and it earned $100m. “Next time we saw our studio,” Meaney says, “they brought in the board to greet us. The chairman said, ‘This is Nick—he’s just saved us $80m.’”…

But providing a service that adapts to individual humans is not the same as becoming like a human, let alone producing art like humans. This is why the rise of algorithms is not necessarily relentless. Their strength is that they can take in that information in ways we cannot quickly understand. But the fact that we cannot understand it is also a weakness. It is worth noting that trading algorithms in America now account for 10% fewer trades than they did in 2009.

Those who are most sanguine are those who use them every day. Nick Meaney is used to answering questions about whether computers can—or should—judge art. His answer is: that’s not what they’re doing. “This isn’t about good, or bad. It is about numbers. These data represent the law of absolute numbers, the cinema-going audience. We have a process which tries to quantify them, and provide information to a client who tries to make educated decisions.”…

Equally, his is not a formula for the perfect film. “If you take a rich woman and a poor man and crash them into an iceberg, will that film always make money?” No, he says. No algorithm has the ability to write a script; it can judge one—but only in monetary terms. What Epagogix does is a considerably more sophisticated version, but still a version, of noting, say, that a film that contains nudity will gain a restricted rating, and thereby have a more limited market.

The larger article suggests algorithms can do better at predicting some human behaviors, such a purchasing consumer items, but not so good in other areas, like critical evaluations of cultural works. There are two ways this might go in the future. On one hand, some will argue this is just about collecting the right data or enough data. Perhaps we simply aren’t looking at the right things to correctly judge cultural products. On the other hand, some will argue that the value of an object may be too difficult for an algorithm to ever figure out. And, even if a formula starts hinting at good or bad art, humans can change their minds and opinions – see all the various cultural, art, and music movements just in the last few hundred years.

There is a lot of money that could be made here. This might be the bigger issue with cultural works in the future: whether algorithms can evaluate them or not, does it matter if they are all commoditized?

Using algorithms for better realignment in the NHL?

The NHL recently announced realignment plans. However, a group of West Point mathematicians developed an algorithm they argue provides a better realignment:

Well, a team of mathematicians at West Point set out to find an algorithm that could solve some of these problems. In their article posted on the arXiv titled Realignment in the NHL, MLB, the NFL, and the NBA, they explore how to easily construct different team divisions. For example, with the relatively recent move of Atlanta’s hockey team to Winnipeg, the current team alignment is pretty weird (below left), and the NHL has proposed a new 4-division configuration (below right):

Here’s how it works. First, they use a rough approximation for distance traveled by each team (which is correlated with actual travel distances), and then examine all the different ways to divide the cities in a league into geographic halves. You then can subdivide those portions until you get the division sizes you want. However, only certain types of divisions will work, such as not wanting to make teams travel too laterally, due to time zone differences…

Anyway, using this method, here are two ways of dividing the NHL into six different divisions that are found to be optimal:

My first thought when looking at the algorithm realignment plans is that it is based less on time zones and more on regions like the Southwest, Northwest, Central, Southeast, North, and Northeast.

But here is where I think the demands of the NHL don’t quite line up with the goals of the algorithm to minimize travel. The grouping of sports teams is often dependent on historic patterns, rivalries, and when teams entered the league. For example, the NHL realignment plans generated a lot of discussion in Chicago because it meant that the long rivalry between the Chicago Blackhawks and the Detroit Red Wings would end. In other words, there is cultural baggage to realignment that can’t only be solved with statistics. Data loses out to narratives.

Another way an algorithm could redraw the boundaries: spread out the winning teams across the league. What teams are really good tends to be cyclical but occasionally leagues end up with multiple good teams in a single division or an imbalance of power between conferences. Why not spread out teams by records which then gives teams a better chance to meet in the finals or other teams in those stacked divisions or conferences a chance to make the playoffs?b