Analyzing Netflix’s thousands of movie genres

Alexis Madrigal decided to look into the movie genres of Netflix – and found lots of interesting data:

As the hours ticked by, the Netflix grammar—how it pieced together the words to form comprehensible genres—began to become apparent as well.

If a movie was both romantic and Oscar-winning, Oscar-winning always went to the left: Oscar-winning Romantic Dramas. Time periods always went at the end of the genre: Oscar-winning Romantic Dramas from the 1950s

In fact, there was a hierarchy for each category of descriptor. Generally speaking, a genre would be formed out of a subset of these components:

Region + Adjectives + Noun Genre + Based On… + Set In… + From the… + About… + For Age X to Y

Yellin said that the genres were limited by three main factors: 1) they only want to display 50 characters for various UI reasons, which eliminates most long genres; 2) there had to be a “critical mass” of content that fit the description of the genre, at least in Netflix’s extended DVD catalog; and 3) they only wanted genres that made syntactic sense.

And the conclusion is that there are so many genres that they don’t necessarily make sense to humans. This strikes me as a uniquely modern problem: we know how to find patterns via algorithm and then we have to decide whether we want to know why the patterns exist. We might call this the Freakonomics problem: we can collect reams of data, data mine it, and then have to develop explanations. This, of course, is the reverse of the typical scientific process that starts with theories and then goes about testing them. The Netflix “reverse engineering” can be quite useful but wouldn’t it be nice to know why Perry Mason and a few other less celebrated actors show up so often?

At the least, I bet Hollywood would like access to such explanations. This also reminds me of the Music Genome Project that underlies Pandora. Unlock the genres and there is money to be made.

Pandora’s (copyright) box

It’s no secret that copyright law is ridiculously complicated and in bad need of reform.  In case anyone needed reminding, paidContent covered Pandora’s CEO Joe Kennedy’s recent speech at the NARM music conference in San Francisco.  The article’s headings say it all:

  • “The complexity of international copyright limits Pandora’s business.”
  • “How huge damages in copyright law have skewed business relationships.”
  • “Our definition of ‘copies’ might need to change for the digital age.”

That’s a pretty good summary of precisely where copyright law has gone wrong.  Be sure to check out the full article.

Oh Canada

I’ve made the point here before that the music industry inexplicably declines perfectly good revenue sources simply because they are “less” than what they are expecting.  At the risk of Monday-morning-quarterbacking their business model, here’s more proof from north of the border, courtesy of Michael Geist:

Pandora, the popular U.S. online music service filed for an initial public offering last week, provided new insight into hugely popular company that spends millions of dollars in copyright royalties. Pandora users listened to a billion hours of music in the last three months of 2010. Given U.S. laws, the Pandora prospectus notes that it paid for the privilege of having its users do so, with the company spending just over half of its revenue on copyright fees – $45 million in the first nine months of 2010.

The numbers are striking since it points to a growing source of revenue that is largely being missed in Canada. Millions of dollars are now generated from online streaming royalties in the U.S., yet many companies are avoiding the Canadian market. The reason, as Pandora explained last year, are the royalty demands of the major record labels. As Tim Westergren stated last fall, “as long as rights societies take this approach, they will prevent Pandora from launching to Canadian users.” While CRIA tried to claim that the decision to avoid the market was a function of Canadian copyright law, Pandora indicated that it is the fee demands, not the laws that are the stumbling block. With millions now being paid for streaming music in the U.S., it is notable that Canadian interests would seemingly prefer to receive nothing rather than the millions that could potentially be on the table.