Searching for a safe harbor

TorrentFreak reported a few days ago that Google has filed an amicus brief in the appeals case against torrent search engine isoHunt:

Google has been keeping an eye on the legal battle between the MPAA and isoHunt as last week, out of nowhere, the company unexpectedly got involved in the motion for summary judgment appeal. The search giant, which has always stayed far away from these types of cases, filed an amicus cuiae brief (third party testimony) at the Appeal Court.

“This cases raises issues about the interpretation and application of the safe-harbor provisions of the Digital Millennium Copyright Act, 17 U.S.C. § 512 et seq. (“DMCA”) and common-law rules governing claims for secondary copyright infringement. Google has a strong interest in both issues,” Google’s counsel writes.

Talk about understatement.  You can read Google’s 39-page brief for yourself over on Scribd — thanks to PaidContent for posting.

TechDirt posted additional commentary late yesterday suggesting that Google’s stance in the isoHunt appeal is mostly about its own ongoing litigation with Viacom:

Google argues that as long as YouTube took down any content it received a takedown notice on, it was in compliance and protected by safe harbors. Viacom leaned heavily on the IsoHunt ruling, to claim that the DMCA doesn’t just cover takedown notice responses, but also requires a response to “red flag” infringement.

However, Google knows that the IsoHunt ruling is basically the only legal precedent out there that reads the DMCA in this manner. So, from Google’s perspective, dumping that reasoning is key. So its amicus brief still argues that IsoHunt is guilty of contributory infringement, a la the Grokster standard, but not because of red flag infringement.

The last thing Google wants is to be liable for copyright infringement under the DMCA every time there is a “red flag” that infringement is taking place; that would be the end of Internet search engines as we know them.

Of course, Google’s business strategy isn’t merely to file amicus briefs and hope for the best; the search giant has also recently taken proactive steps to reduce its liability, including turning off autocomplete results for torrent-related searches.  I guess this is what the Intellectual Property Enforcement Coordinator (IPEC) meant by “dialogue”, as detailed in her recent report:

the IPEC has facilitated and encouraged dialogue among the different private sector Internet intermediaries that contribute to the dynamic nature and functioning of the Internet, including payment processors, search engines and domain name registrars and registries. These entities are uniquely positioned to enhance efforts of rightholders and law enforcement to combat infringing activity and help reduce the distribution of infringing content in a manner consistent with our commitment to the principles of fair process, freedom of expression and other important public policy concerns. We believe that most companies share the view that providing services to infringing sites is inconsistent with good corporate business practice and we are beginning to see several companies take the lead in pursuing voluntary cooperative action.

I’m not sure how “voluntary” this really is — or whether “fair process” and “freedom of expression” accurately describes a “dialogue” written under a Damoclesian sword of statutory copyright damages and domain name seizures.  But I will agree that ruinous lawsuits and seizures are “inconsistent with good corporate business practice”.

Hat tip to Keith Lowery for sending me the link to the original TorrentFreak story.

Is search engine optimization key to Huffington Post’s success?

This article suggests the Huffington Post’s value (exhibited in its recent sale to AOL) is based more on search engine optimization than on news or citizen journalism:

In addition to writing articles based on trending Google searches, The Huffington Post writes headlines like a popular one this week, “Watch: Christina Aguilera Totally Messes Up National Anthem.” It amasses often-searched phrases at the top of articles, like the 18 at the top of the one about Ms. Aguilera, including “Christina Aguilera National Anthem” and “Christina Aguilera Super Bowl.”

As a result of techniques like these, 35 percent of The Huffington Post’s visits in January came from search engines, compared to 20 percent for CNN.com, according to Hitwise, a Web analysis firm.

Mario Ruiz, a spokesman for The Huffington Post, said search engine optimization played a role on the site but declined to discuss how it was used.

Though traditional print journalists might roll their eyes at picking topics based on Google searches, the articles can actually be useful for readers. The problem, analysts say, is when Web sites publish articles just to get clicks, without offering any real payoff for readers.

This is an ongoing issue with online news providers: simply producing good journalistic content doesn’t get the same number of clicks as celebrity and gossip-laden stories. And as the article suggests, some search engines, such as Google, may fight back by reducing the rank or placement of pages or sites that rely heavily on popular keywords.

But aren’t these sorts of practice inevitable when making money on the Internet is based around page views and clicking on advertisements? The goal has to be simply getting the most viewers rather than providing the best or more complete or most useful content.

Found hypocrisy; still searching for clarity

In case you haven’t heard, a few days ago Google started publicly accusing Microsoft’s Bing of stealing its search results.  Juan Carlos Perez over at PCWorld has published an interesting roundup of reactions to Google’s new “strategy” of public accusations:

While the merits of Google’s accusation are up for debate — Microsoft denies the charge — the fact that Google chose to complain in such a loud and agitated manner has become fertile ground for analysis and comment by industry observers.

Opinions range from those who view Google’s actions as hypocritical to others who say the company did the right thing by airing its grievance.

PCWorld’s link to Daniel Eran Dilger reaction over at Roughly Drafted is especially worth checking out.  Personally, I come down on the “Google is being hypocritical” side of things.  It’s hard to have the expansive view of copyright law and fair use that Google embraces for its own activities and then to complain with any legitimacy about Microsoft’s alleged behavior.

Unfortunately, copyright law in general (and fair use in particular) is notoriously unclear, malleable, and subject to judicial whims.  It’s doubtful that Google will actually sue Microsoft over this, so we may never know what the “answer” is.

However, even if a U.S. court upheld Microsoft’s right to copy Google’s search results (assuming that’s what happened here), that would only give us an answer (1) on these specific facts (2) as between parties willing to litigate (and maybe even (3) before that particular judge).  Given the high costs of litigation, most non-Fortune-500 copyright users claiming fair use rights usually find it is in their best interest to settle for a few thousand dollars when saddled with a copyright infringement lawsuit.  Indeed, there are companies based on this very business model that are out there suing people; the number of copyright infringement suits is rising.

This latest spat between Google and Microsoft is, to some extent, a sideshow, but it does highlight some of the problems that uncertainty breeds within copyright law.  I’m not worried about Microsoft’s ability to defend itself:  it’s a multi-billion dollar company with lawyers and PR specialists both in-house and on speed dial.  I am worried about the start ups that are seeking to be the next Google or Microsoft:  they generally can’t afford to get anywhere close to the line because they know that an infringement lawsuit may mean millions in legal fees and damages, so they back off and play it safe.

That’s the real cost of un-clarity in copyright law.

Find (if ye know how to seek)

It’s a few days old now, but I just ran across a post over on TorrentFreak describing how Google has started removing “torrent”-related results from its auto-complete search results:

Without a public notice Google has compiled a seemingly arbitrary list of keywords for which auto-complete is no longer available. Although the impact of this decision does not currently affect full search results, it does send out a strong signal that Google is willing to censor its services proactively, and to an extent that is far greater than many expected.

Among the list of forbidden keywords are “uTorrent”, a hugely popular piece of entirely legal software and “BitTorrent”, a file transfer protocol and the name of San Fransisco based company BitTorrent Inc. As of today [1/26/2011], these keywords will no longer be suggested by Google when you type in the first letter, nor will they show up in Google Instant.

All combinations of the word “torrent” are also completely banned. This means that “Ubuntu torrent” will not be suggested as a user types in Ubuntu, and the same happens to every other combination ending in the word torrent. This of course includes the titles of popular films and music albums, which is the purpose of Google’s banlist.

This is quite an interesting development.  Personally, I have found Google’s auto-complete functionality very helpful in finding the names of half-remembered items.  It is a disturbing reminder of just how much control Google exerts–not only over what we find, but over what we search for.

Google offers tool to analyze texts going back to the 1500s

Among other projects Google has been working on, they recently opened a new online tool that allows users to search for certain words in texts going back to the 1500s:

With little fanfare, Google has made a mammoth database culled from nearly 5.2 million digitized books available to the public for free downloads and online searches, opening a new landscape of possibilities for research and education in the humanities.

The digital storehouse, which comprises words and short phrases as well as a year-by-year count of how often they appear, represents the first time a data set of this magnitude and searching tools are at the disposal of Ph.D.’s, middle school students and anyone else who likes to spend time in front of a small screen. It consists of the 500 billion words contained in books published between 1500 and 2008 in English, French, Spanish, German, Chinese and Russian…

“The goal is to give an 8-year-old the ability to browse cultural trends throughout history, as recorded in books,” said Erez Lieberman Aiden, a junior fellow at the Society of Fellows at Harvard…

“We wanted to show what becomes possible when you apply very high-turbo data analysis to questions in the humanities,” said Mr. Lieberman Aiden, whose expertise is in applied mathematics and genomics. He called the method “culturomics.”

The article mentions some projects that use this database and sound interesting. And it sounds the dataset can be downloaded and analyzed by users on their own computers.

But thinking about the methodology of this all, I would have some questions.

1. Do we know how well these digitized texts represent the full population of texts? This is a sampling issue – could there be some sort of bias in what kind of texts ended up in this database?

2. Studying word frequency by itself is tricky. Simply counting words and when they appear is one measurement while trying to assess the importance placed in each word is another task. Do the three little “culturnomics” graphs on the left side of the online story really tell us much?

3. It sounds like this would be best for looking at how language (grammar, word choices, structure, etc.) has changed over time.

$1 for your trouble

How much is a technical trespass worth?  Apparently $1. That’s the amount just granted to a couple who had their home photographed by Google as part of its Street View service:

over two and a half years after the case got started, a judge has handed down her consent judgement, ruling that that Google was indeed guilty of Count II Trespass. [The plaintiffs] are getting a grand total of $1 for their trouble. Ouch.

Ouch indeed.  It’s not quite Bleak House, but 2.5 years of litigation is an awful lot of trouble for $1, any way you measure it.

Google measures inflation by looking at web data

Once again drawing upon its access to  information, Google suggests it developing an alternative measure of inflation:

Google is using its vast database of web shopping data to construct the ‘Google Price Index’ – a daily measure of inflation that could one day provide an alternative to official statistics.

The work by Google’s chief economist, Hal Varian, highlights how economic data can be gathered far more rapidly using online sources. The official Consumer Price Index data are collected by hand from shops, and only published monthly with a time lag of several weeks…

The GPI shows a “pretty good correlation” with the CPI for goods such as cameras and watches that are often sold on the web, but less so for others, such as car parts, that are infrequently traded online.

This bears watching as Google can access data and then analyze/summarize it at a much quicker speed than the government. But it will be interesting to see how Google gets around the issue of what is being sold online – the story also notes that Google’s index downplays the role of housing.

This could play out in a number of ways. Could this online index be improved so that markets were responding to Google’s data rather than the government’s data? Let’s say the government decides it likes Google’s approach. Does it develop the same or a similar algorithm within the government? Does it contract the task to Google?

The appeal of Google and its driverless cars

It was recently revealed that Google has been testing automated cars for some time now:

With someone behind the wheel to take control if something goes awry and a technician in the passenger seat to monitor the navigation system, seven test cars have driven 1,000 miles without human intervention and more than 140,000 miles with only occasional human control. One even drove itself down Lombard Street in San Francisco, one of the steepest and curviest streets in the nation. The only accident, engineers said, was when one Google car was rear-ended while stopped at a traffic light.

Autonomous cars are years from mass production, but technologists who have long dreamed of them believe that they can transform society as profoundly as the Internet has.

Why does this story have as much as appeal as it seems to have on the Internet? A quick argument:

This is a dream dating back decades. The futuristic exhibits of the mid 20th century had visions of this: people blissfully enjoying their trips while the cars took care of the driving. To see the dream come to fruition is satisfying and fulfilling. On a broader scale, this is part of the bigger narrative of technological progress. Although it has been delayed longer than some imagined, it demonstrates ingenuity and the progress of Americans. Since Americans have a well-established love affair with the automobile, driverless cars offers the best of all worlds: personal freedom in transportation without the need to actually do any work. And if we soon get cars that run on electricity or hydrogen, it can be completely guilt-free transportation!

A roundup of views on “supercharged Wi-Fi”

Federal regulators are about to open up more of the wireless spectrum for Wi-Fi use – but commentators disagree about who will benefit most. Google and other big companies? Consumers? Rural areas? Cities? Read a useful round-up here.

Users spend more time on Facebook than Google’s sites

According to figures from August, web users in the United States now spend more time per day on Facebook than Google’s sites (which includes YouTube). This can’t be good news for Google – but it shows the power of Facebook:

In August, people spent a total of 41.1 million minutes on Facebook, comScore said Thursday, about 9.9% of their Web-surfing minutes for the month. That just barely surpassed the 39.8 million minutes, or 9.6%, people spent on all of Google Inc.’s sites combined, including YouTube, the free Gmail e-mail program, Google news and other content sites.

U.S. Web users spent 37.7 million minutes on Yahoo Inc. sites, or 9.1% of their time, putting Yahoo third in terms of time spent browsing. In July, Facebook crept past Yahoo for the first time, according to comScore.

Facebook appears to be growing more and more popular. Google can’t figure out a way to introduce social connectivity throughout their sites – whatever happened to Google Wave?