OCLC Research 2010 – Cloud Library

The Cloud Library project (see the exposition that follows for a quick reductive overview of the idea) got a lot of attention and had a big impact in the research library community this past year.

Dark foreground and clouds, mountains highlighted, "Heaven's Peak," Glacier National Park, Montana.

My colleague, Constance Malpas, is the principal intellectual engine driving this effort. She’s shaped the opportunities, divined the evidence to support first steps and generally been a tireless participant in the discussions and action planning that have sprung up around this opportunity. She’s busy now finalizing a report which will be available in January, 2011. We’ll blog about its release.

What’s the idea? In the same way that cloud computing offers resources and applications on demand without the user having to operate and own the underlying assets, the cloud library project posited that it is now possible for academic libraries to rely on access to needed book and journal assets rather than manage them as locally-resident and managed physical items.

The entry point for exploring this possibility was to reconsider the relation between a library’s physical book collection, off-site storage repositories, and emerging digital text aggregations. Could a library change its local print inventory by relying on supply of a digital version of the text from a digital aggregator and offer a print volume when necessary through an arrangement with an existing storage facility(ies)? We found willing exemplars of each player – New York University as a customer, ReCAP as a storage facility willing to supply and the Hathi Trust as a digital text aggregator willing to offer access to an electronic version of the book.

Constance examined the overlap between NYU’s collection, the holdings of ReCAP and the rapidly growing database of digital texts being built by the Hathi Trust from the digital copies received by participants in the Google Books Library Project and other digital copies of books. It was tough to find out what was in ReCAP as those holdings haven’t been transmitted to OCLC and it was difficult to keep up with the Hathi Trust as the corpus was growing so rapidly. Nevertheless the analyses were done and have been repeated on a regular basis.

The results of this initial effort have been discussed in many forums. A nice summary by Constance is in this presentation from the RLG Partnership Annual Meeting 10 June 2010. The key findings were that 30% or more of the local collection was already present in the digital aggregation, that 75% of the mass digitized texts were already ‘backed up’ in one or more of the storage repositories and that, unfortunately, only a very small percentage of the intersection was in the public domain. Since the Google Book settlement was delayed this last point meant that there was no way to use the Hathi Trust as more than a digital preservation structure. Nevertheless even without an e-book delivery opportunity the amount of the library space that could be freed up by a willingness to responsibly (i.e. with contractual understanding, with proper reassurances about preservation, with Hathi acting as another preservation format and with appropriate coordination across multiple repositories willing to supply) rely on delivery from remote sources is quite large. In the NYU case it meant more than 700,000 volumes could be relegated and managed differently.

This finding prompted further work to look at other ARL library collections. The fascinating and hopefully motivating finding was that the percent duplication obtained across nearly the entire ARL group. That is, more than 30% of most library’s collections were already duplicated in the mass digitized aggregation and this was expected to grow to 50% over the next 3 years. This means that most ARLs would benefit in similarly substantial ways by moving from reliance on their local print collections to reliance on storage repository supply. Constance gave a very nice presentation(ppt) on these findings along with recommendations about where change could start and things to stop doing at the October 2010 ARL meeting.

The overwhelming evidence that this project has assembled coupled with extraordinary budget pressures have resulted in genuine action plans at individual research libraries as well as plans for group responses. Requirements for the necessary shared infrastructure are being articulated and pilot efforts are being launched. The ‘cloud library’ seeded by OCLC Research efforts is precipitating change. (Block that metaphor.)

Tweet about this on TwitterShare on TumblrShare on LinkedInShare on FacebookBuffer this pageShare on Google+Email this to someone

About Jim Michalko

Jim coordinates the OCLC Research office in San Mateo, CA, focuses on relationships with research libraries and work that renovates the library value proposition in the current information environment.