Google had plans to scan every book but the project hit some legal bumps along the way and now the company has “a database containing 25-million books and nobody is allowed to read them”:
Google thought that creating a card catalog was protected by “fair use,” the same doctrine of copyright law that lets a scholar excerpt someone’s else’s work in order to talk about it. “A key part of the line between what’s fair use and what’s not is transformation,” Google’s lawyer, David Drummond, has said. “Yes, we’re making a copy when we digitize. But surely the ability to find something because a term appears in a book is not the same thing as reading the book. That’s why Google Books is a different product from the book itself.”…
It’s been estimated that about half the books published between 1923 and 1963 are actually in the public domain—it’s just that no one knows which half. Copyrights back then had to be renewed, and often the rightsholder wouldn’t bother filing the paperwork; if they did, the paperwork could be lost. The cost of figuring out who owns the rights to a given book can end up being greater than the market value of the book itself. “To have people go and research each one of these titles,” Sarnoff said to me, “It’s not just Sisyphean—it’s an impossible task economically.” Most out-of-print books are therefore locked up, if not by copyright then by inconvenience…
What became known as the Google Books Search Amended Settlement Agreement came to 165 pages and more than a dozen appendices. It took two and a half years to hammer out the details. Sarnoff described the negotiations as “four-dimensional chess” between the authors, publishers, libraries, and Google. “Everyone involved,” he said to me, “and I mean everyone—on all sides of this issue—thought that if we were going to get this through, this would be the single most important thing they did in their careers.” Ultimately the deal put Google on the hook for about $125 million, including a one-time $45 million payout to the copyright holders of books it had scanned—something like $60 per book—along with $15.5 million in legal fees to the publishers, $30 million to the authors, and $34.5 million toward creating the Registry….
This objection got the attention of the Justice Department, in particular the Antitrust division, who began investigating the settlement. In a statement filed with the court, the DOJ argued that the settlement would give Google a de facto monopoly on out-of-print books. That’s because for Google’s competitors to get the same rights to those books, they’d basically have to go through the exact same bizarre process: scan them en masse, get sued in a class action, and try to settle. “Even if there were reason to think history could repeat itself in this unlikely fashion,” the DOJ wrote, “it would scarcely be sound policy to encourage deliberate copyright violations and additional litigation.”
Out-of-print books with uncertain copyright status scuttle what could be one of the great treasure troves of information? This suggests we still have a ways to go until we have legal structures that can deal with the information-rich and easily accessible online realm. If a deal could eventually be worked out for books, what about older music, art, and other cultural works?
A related thought: having all those books available might indeed change the academic enterprise in several ways. First, we could easily access more sources of data. Second, we could potentially cite many more sources.