Activist charged for downloading millions of JSTOR articles

Many academics use databases like JSTOR to find articles from academic journals. However, one user violated the terms of service by downloading millions of articles and is now being charged by the federal government:

Swartz, the 25-year-old executive director of Demand Progress, has a history of downloading massive data sets, both to use in research and to release public domain documents from behind paywalls. He surrendered in July 2011, remains free on bond and faces dozens of years in prison and a $1 million fine if convicted.

Like last year’s original grand jury indictment on four felony counts, (.pdf) the superseding indictment (.pdf) unveiled Thursday accuses Swartz of evading MIT’s attempts to kick his laptop off the network while downloading millions of documents from JSTOR, a not-for-profit company that provides searchable, digitized copies of academic journals that are normally inaccessible to the public…

“JSTOR authorizes users to download a limited number of journal articles at a time,” according to the latest indictment. “Before being given access to JSTOR’s digital archive, each user must agree and acknowledge that they cannot download or export content from JSTOR’s computer servers with automated programs such as web robots, spiders, and scrapers. JSTOR also uses computerized measures to prevent users from downloading an unauthorized number of articles using automated techniques.”

MIT authorizes guests to use the service, which was the case with Swartz, who at the time was a fellow at Harvard’s Safra Center for Ethics.

It sounds like there is some disconnect here: services like JSTOR want to maintain some control over the academic content they provide even as they exist to help researchers find printed scholarly articles. Services like JSTOR can make big money by collating journal articles and requiring libraries to pay for access. Thus, someone like Swartz could download a lot of the articles and then avoid paying for or using JSTOR down the road (though academic users are primarily paying through institutions who pass the costs along to users). But what is “a limited number of journal articles at a time”? Using an automated program is clearly out according to the terms of service but what if a team of undergraduates banded together, downloaded a similar number of articles, and pooled their downloads?

If we are indeed headed toward a world of “big data,” which presumably would include the thousands of scholarly articles published each year, we are likely in for some interesting battles in a number of areas over who gets to control, download, and access this data.

Another thought: does going to open access academic journals eliminate this issue?

Accessing the public domain through JSTOR

Academic journal archiver JSTOR has just made public domain articles a lot more accessible:

[W]e are making journal content on JSTOR published prior to 1923 in the United States and prior to 1870 elsewhere, freely available to the public for reading and downloading. This includes nearly 500,000 articles from more than 200 journals, representing approximately 6% of the total content on JSTOR.

We are taking this step as part of our continuous effort to provide the widest possible access to the content on JSTOR while ensuring the long-term preservation of this important material.

Mike Masnick over at Techdirt recounts some history that provides context for JSTOR’s decision:

You may recall that following the indictment of Aaron Swartz for downloading some JSTOR papers, a guy named Greg Maxwell decided to upload 33GBs of public domain papers from JSTOR and make them available via The Pirate Bay. He had the papers for a while, but was afraid that he’d get legally harassed for distributing them.

JSTOR explicitly acknowledge this history in its announcement (emphasis added):

I realize that some people may speculate that making the Early Journal Content free to the public today is a direct response to widely-publicized events over the summer involving an individual [Aaron Swartz] who was indicted for downloading a substantial portion of content from JSTOR, allegedly for the purpose of posting it to file sharing sites. While we had been working on releasing the pre-1923/pre-1870 content before the incident took place, it would be inaccurate to say that these events have had no impact on our planning. We considered whether to delay or accelerate this action, largely out of concern that people might draw incorrect conclusions about our motivations. In the end, we decided to press ahead with our plans to make the Early Journal Content available, which we believe is in the best interest of our library and publisher partners, and students, scholars, and researchers everywhere.

Regardless of how this happened, I applaud JSTOR for greatly furthering access to public domain academic journal articles.

H/T Techdirt/Copycense.