Archive for the 'Managing the Collective Collection' Category

Marking Progress: print archives disclosure

Friday, May 25th, 2012 by Constance

For the past year and a half, Dennis and I have been working closely with a group of Research Library Partners and others to develop and test a method for registering print archives in WorldCat.  I’m pleased to say that the OCLC Print Archives Disclosure Pilot is now complete and a final report of our findings has been published. The report was jointly authored by Lizanne Payne (project director of the Western Regional Storage Trust), Emily Stambaugh (manager of  the California Digital Library’s Shared Print program), along with Dennis and myself.  Partners in this project included the Center for Research Libraries, (CRL), the California Digital Library (CDL), and the libraries of Indiana University; Stanford University; the University of California, Los Angeles; the University of California, San Diego; the University of Minnesota and the University of Oregon.

The report has actually been out for a few weeks now; it was published without fuss or fanfare at the end of April. Gary Price was kind enough to feature it in an InfoDocket post last month, and it’s been making the rounds on some of the specialized discussion lists devoted to print archiving and preservation activities.  The specifics of the report — guidance on how and where to register print preservation commitments — apply to a relatively small number of institutions, but the publication itself marks a milestone for library community as a whole.  It represents the culmination of several related efforts directed at redesigning the critical (and costly) business of  preserving print books and journals.

It’s been a long road.  Back in 2009, an OCLC Research working group undertook a review of shared print policy documents that revealed some significant gaps in existing guidance, particularly with respect to how and where print archiving commitments should be expressed or registered:

About half of the policies [examined in the report] stipulate that the special retention and/or shared access status of documents covered by the agreement should be systematically registered; less than 20% specify a location in the MARC21 bibliographic or local holdings record where this information is to be recorded. Only a quarter of the policies reviewed mandate disclosure of the retention or shared access status in regional, national or international union lists.

This last finding has important implications for collection-sharing efforts that seek to achieve significant scale or impact on system-wide economies. More effective and systematic disclosure of retention commitments, in particular, might produce significant network effects by enabling anonymous participation in collection-sharing initiatives, generating secondary benefits for the entire library community.

Predictably, the report closed with a set of recommendations (or admonitions) intended to address the policy gaps that we felt were most important:

Cooperative agreements that are intended to achieve or to enable truly transformative change in the way library print collections are managed should include:

  • A business model that acknowledges the changing value of library print resources in the current information environment;
  • An explicit acknowledgment that effective disclosure of library holdings and retention commitments is necessary to support distributed management of print archives; and
  • A commitment to capture, retain and share item-level condition information so that the preservation quality of print archives may be better judged.

The working group that contributed to the policy review was disbanded in 2009, but several participants continued to work, more or less informally, on drafting a set of guidelines for print archives disclosure in WorldCat.  That effort was explicitly modeled on modeled on practices developed in the 1990s for recording preservation microfilming information.  At the time, NEH was funding a large-scale brittle books preservation program and, to reduce duplicative effort, participating libraries needed a mechanism for identifying the titles and volumes that were already queued for filming.  Nancy Elkington was a prime mover in developing standard practices for recording this information in bibliographic union catalogs, using the MARC 583 Preservation Action Note.

Along with Deb McKern, a preservation officer at the Library of Congress, Nancy encouraged us to extend use of the 583 Action Note to print archiving activities.  Since 2005, use of the 583 had already been extended to registration of digital archives in the Registry of Digital Masters, a joint effort of the Digital Library Federation and OCLC.  It seemed sensible to build upon this past work in developing guidelines for registering print archiving commitments.  However, our initial effort to define guidelines for print archives disclosure foundered when it became clear that the bibliographic record was not an appropriate vehicle for recording item-level condition or retention statements.  For journal archiving efforts in particular, it was difficult to convey in a title-level record how much of a given journal run was actually preserved.  And, in a master-record union catalog like WorldCat, it was even harder to see how archiving commitments from multiple institutions could be adequately represented.

For a year or more, our efforts to define descriptive metadata guidelines for print archiving lay fallow.  Other projects were taken up.  But by 2010, with the emergence of several large-scale print journal archiving efforts and increasing public awareness of the importance of distributed preservation, it was clear that common approach to identifying shared print collections was urgently needed.  As anticipated in our 2009 report, the largest archiving efforts were finding it impossible to “scale up” without some shared infrastructure.  Happily, in the intervening years, support for item-level holdings information in WorldCat had increased substantially and it was possible to design and test a disclosure strategy that was better adapted to journals.  With the support of OCLC product management, the Print Archives Disclosure Pilot project was launched.  And as a result we are now — collectively — in a better place to design and implement scalable strategies for print preservation.

Libraries rebound

Monday, April 9th, 2012 by Merrilee

I’d like to put in a plug for the next event for those who are in the OCLC Research Libraries Partnership, which is
Libraries Rebound: Embracing Mission, Maximizing Impact (June 5-6, Philadelphia). We are still confirming speakers but so far we’ve got a great line up of speakers — we’re also adding reactor panels, so check out the program now and in a week or two to see how it’s shaping up.

The meeting will focus on how libraries can more closely tie services and collections to the university’s (or parent institution’s) mission. In the midst of static or decreasing budgets, being able to demonstrate impact in the pursuit of the institution’s research and teaching goals is paramount.

The day and a half meeting will focus on three themes:

  • How library staff are working side-by-side with researchers in specific disciplines
  • How institutions are adapting special collection-building to align with high priority teaching and research focus areas
  • How libraries are using library space to forge partnerships with other units on campus
  • We’re fortunate to have some smart people from forward-looking institutions who will share their knowledge and experiences with us. And the conversation and discussion will definitely spill into areas beyond the three themes I’ve outlined above. Which is where you come in — we need you to come and talk about what you have planned (as well as to learn from your peers). Register now! Always free for those in the partnership.

    Questions? Let us know. We always love to hear from you.

    More about the “mean news”

    Tuesday, February 7th, 2012 by Merrilee

    A while ago I blogged about Sarah M. Pritchard’s talk at the RBMS preconference. I’m still quite taken with her talk about aligning collections and services with mission, and now it’s available online. Go listen to it (she’s the first speaker in the session). You won’t be sorry.

    Other presentations from the preconference are also available for consumption.

    The Empires Fight Back – Globalization and Area Studies

    Tuesday, June 14th, 2011 by Jennifer

    Last week at our FutureCast meeting, Deborah Jakubs, University Librarian at Duke, gave us a thoughtful analysis of internationalizing education and research collections. She was commenting on Ben Wildavsky’s talk about an increasingly mobile academy, the emergence of global universities, and the role of global rankings. Deborah  put Wildavsky’s thesis about globalization and higher education in a research library context. I asked Deborah for her notes, and she has allowed me to post them. I have made my own personal selections here.


    • Higher education has gone global
    • Language learning/fluency very important
    • Increased collaboration with research partners, co-authors,  beyond the US
    • Access for non-US researchers to scholarship produced in the US and internationally


    • Title VI funding for area studies is threatened precisely when language/cultural expertise is needed
    • Research libraries see continued decline in “foreign acquisitions”
    • Trend in libraries to justify expenditures on use, ROI
    • Limited and/or uneven production of and access to digital scholarly resources worldwide
    • Contradiction between globalized universities and diminishing focus on global acquisitions
    • How will needs of scholars for access to non-English, often obscure, materials be met?
    • Erosion of the mission of research libraries to focus on the most-used or most-requested, turning away from more specialized
    • Implications of just in time vs. just in case for foreign materials?


    • Focus more on less available materials;  “core” is easily found (see Hathi Trust, etc.)
    • Treat foreign materials as special collections
    • What’s the information landscape beyond the US, in developing countries?
    • Can we develop centers of strength?
    • Given the partnerships between US and non-US researchers/institutions, we should develop parallel partnership with libraries in other countries

    It will come as no surprise to many that Deborah is on the task force on International Engagement of ARL Libraries.

    The video recordings of the FutureCast plenary sessions and response panels will be posted shortly.

    The Scottish Presence in the Global Library Resource

    Wednesday, March 23rd, 2011 by Brian

    “Collective collections” are the combined library collections of multiple institutions. They may exist as a physical aggregation of materials at a single location; they may exist through a service layer that integrates distinct collections into a single resource. Or they may be only notional: a hypothetical combination that can be mined for intelligence to inform institutional and collaborative decision-making. Collective collections can be assembled at any scale, from two institutions to the global library system as a whole. In the latter case, of course, we can only approximate – no single data source represents the holdings of all libraries everywhere. Fortunately, the more than 200 million records and 1.7 billion holdings contained in the WorldCat database provide us with an approximation of the global library resource that is sufficient to explore many interesting questions.

    A project currently underway in OCLC Research is exploring the concept of a national presence in the global library resource. A national presence can be characterized from a number of perspectives, including the distinctive features of the country’s library collections; the output of the country’s publishing houses; works authored by the country’s citizens; and the corpus of materials that, regardless of origin, are “about” some aspect of that country. All of these facets, taken together, form a picture of how a country’s profile is manifested in library collections around the world. The project uses Scotland as a case study to illustrate the concept of a national presence in the global library resource, but the goal is to develop patterns of analysis that can be applied without significant modification to any country.

    The project is investigating three major themes:

    • National Research Collection: the project examines the notion of a “national research collection” – the combined library collections of a nation’s higher education and research-oriented institutions – and how it aligns with the global library resource. The analysis focuses on the collective collection of the four ancient Scottish universities (Aberdeen, Edinburgh, Glasgow, and St. Andrews) as well as the National Library of Scotland. The purpose is to uncover the distinctive features of this collective research collection vis-Ă -vis groups of peer institutions in the library system (e.g., the collective holdings of ARL institutions), as well as the library system as a whole (as represented by WorldCat).
    • National Presence in the Global Library Resource: this aspect of the project shifts focus from the contents of Scottish library collections to the presence of Scotland-related materials in library collections around the world. The analysis focuses on materials published in Scotland, created by Scottish authors, or primarily about some aspect of Scotland. Key questions include the size of the Scottish national presence in the global library resource, as well as the characteristics of the materials comprising this presence.
    • Diffusion of a National Presence within the Global Library Resource: given the materials identified as comprising a national presence, WorldCat holdings data can be used to track their pattern of diffusion throughout the global library resource. From this, many interesting questions can be explored: the locations of extensive collections of Scotland-related materials outside Scotland; comparisons of the diffusion of Scotland-related materials with the diffusion of the “Scottish diaspora”; and global collecting activity as a means of identifying “core” (i.e., particularly influential) Scottish works.

    Analysis of a national presence in the global library resource is relevant to a range of library decision-making needs, including collection development strategies, prioritization of digitization activities, “gap analysis” for national library collections, as well as other applications. A key source of value in all of these potential uses is the ability to consider the features of a national presence against the broader context of the global library resource. The capacity to frame collections and services within a system-wide perspective is a tool of growing importance for library-related analysis and decision-making.

    Note: Thanks to colleagues at the Universities of Aberdeen, Edinburgh, Glasgow, and St. Andrews, and the National Library of Scotland, for their ongoing participation in this work. Special thanks to our much-missed colleague John MacColl, who, in his former role as an OCLC Research program officer, was instrumental in designing and guiding this project. We look forward to his continued participation in the project in his new role as University Librarian and Director of Library Services at the University of St. Andrews!

    Cloud-sourcing Research Collections: Managing Print in the Mass-digitized Library Environment

    Thursday, January 6th, 2011 by Jim

    The report of this project is now available. We blogged about it recently in the series of posts summarizing our major activities at year end.

    The importance of shared print initiatives is growing and the Hathi Trust is poised to become an important element in the library infrastructure of the future. Their participant list now shows 52 institutional participants and three major consortia. It’s clear that mass digitization, the flip to electronic resources and space demands have resulted in a new view of the print collection in academic libraries. There is now motivated discussion among research libraries about how to construct a new system of services based on the digitized aggregation, local collections and shared storage repositories.

    To quote my colleague, Lorcan Dempsey, “we are pleased that much of the empirical context for this discussion and quite a bit of intellectual leadership has come from OCLC Research work being done for the RLG Partnership. Constance Malpas has been leading this activity, and has been creating and supporting links between various community initiatives and relevant product areas in OCLC.

    Her much-awaited report detailing the initial work that formed the basis for this activity has now appeared. It is likely to be quite influential in future planning activities.”

    Even if this is not a core interest you should be familiar with the major findings. You’ll be pleased to see that the report has an excellent executive summary ;)

    OCLC Research 2010 – Cloud Library

    Thursday, December 23rd, 2010 by Jim

    The Cloud Library project (see the exposition that follows for a quick reductive overview of the idea) got a lot of attention and had a big impact in the research library community this past year.

    Dark foreground and clouds, mountains highlighted, "Heaven's Peak," Glacier National Park, Montana.

    My colleague, Constance Malpas, is the principal intellectual engine driving this effort. She’s shaped the opportunities, divined the evidence to support first steps and generally been a tireless participant in the discussions and action planning that have sprung up around this opportunity. She’s busy now finalizing a report which will be available in January, 2011. We’ll blog about its release.

    What’s the idea? In the same way that cloud computing offers resources and applications on demand without the user having to operate and own the underlying assets, the cloud library project posited that it is now possible for academic libraries to rely on access to needed book and journal assets rather than manage them as locally-resident and managed physical items.

    The entry point for exploring this possibility was to reconsider the relation between a library’s physical book collection, off-site storage repositories, and emerging digital text aggregations. Could a library change its local print inventory by relying on supply of a digital version of the text from a digital aggregator and offer a print volume when necessary through an arrangement with an existing storage facility(ies)? We found willing exemplars of each player – New York University as a customer, ReCAP as a storage facility willing to supply and the Hathi Trust as a digital text aggregator willing to offer access to an electronic version of the book.

    Constance examined the overlap between NYU’s collection, the holdings of ReCAP and the rapidly growing database of digital texts being built by the Hathi Trust from the digital copies received by participants in the Google Books Library Project and other digital copies of books. It was tough to find out what was in ReCAP as those holdings haven’t been transmitted to OCLC and it was difficult to keep up with the Hathi Trust as the corpus was growing so rapidly. Nevertheless the analyses were done and have been repeated on a regular basis.

    The results of this initial effort have been discussed in many forums. A nice summary by Constance is in this presentation from the RLG Partnership Annual Meeting 10 June 2010. The key findings were that 30% or more of the local collection was already present in the digital aggregation, that 75% of the mass digitized texts were already ‘backed up’ in one or more of the storage repositories and that, unfortunately, only a very small percentage of the intersection was in the public domain. Since the Google Book settlement was delayed this last point meant that there was no way to use the Hathi Trust as more than a digital preservation structure. Nevertheless even without an e-book delivery opportunity the amount of the library space that could be freed up by a willingness to responsibly (i.e. with contractual understanding, with proper reassurances about preservation, with Hathi acting as another preservation format and with appropriate coordination across multiple repositories willing to supply) rely on delivery from remote sources is quite large. In the NYU case it meant more than 700,000 volumes could be relegated and managed differently.

    This finding prompted further work to look at other ARL library collections. The fascinating and hopefully motivating finding was that the percent duplication obtained across nearly the entire ARL group. That is, more than 30% of most library’s collections were already duplicated in the mass digitized aggregation and this was expected to grow to 50% over the next 3 years. This means that most ARLs would benefit in similarly substantial ways by moving from reliance on their local print collections to reliance on storage repository supply. Constance gave a very nice presentation(ppt) on these findings along with recommendations about where change could start and things to stop doing at the October 2010 ARL meeting.

    The overwhelming evidence that this project has assembled coupled with extraordinary budget pressures have resulted in genuine action plans at individual research libraries as well as plans for group responses. Requirements for the necessary shared infrastructure are being articulated and pilot efforts are being launched. The ‘cloud library’ seeded by OCLC Research efforts is precipitating change. (Block that metaphor.)

    Sorting out Demand: some thoughts on library inter-lending

    Friday, July 30th, 2010 by Constance

    Over the past few years, OCLC Research has done quite a bit of analytic work based on what my colleague Brian Lavoie refers to as “supply-side” data. Examples include the well-known Google 5 study, as well as a variety of projects examining the library long tail, several of them summarized in an article Lorcan published some time ago. Much of this work has been based on data aggregated in the WorldCat bibliographic database. These data have been contributed over many years by OCLC members to support a variety of shared library services, including cooperative cataloging and inter-lending operations; as a secondary effect, the aggregation has provided a rich source of information about the system-wide library collection that is regularly mined in both internal and extra-mural research projects.

    More recently, we have begun to think about how we might make better use of the demand-side data that is generated by a variety of routine library operations, especially circulation and inter-lending.  Lorcan in particular has given thought to how “intentional data” might usefully shape library service provision.

    Inter-library loan transactions are a particularly interesting example of intentional data, I think. Read the rest of this entry »

    Pat the Elephant

    Friday, July 23rd, 2010 by Constance

    There is a well-known fable about blind men with contrasting views on the anatomy of an elephant, each having examined a separate piece of the beast and independently concluded that it is either very like a spear, or a fan, or a snake, etc.  Even in combination their observations fail to provide a very good picture of what an elephant looks like as a whole.  The story was popularized in a poem by John Godfrey Saxe which is cited in a surprisingly wide variety of publications, from early childhood education manuals, to scientific and medical reports, to vocational guides and, more predictably, collections of 19C verse.  I know this because a search on a distinctive phrase from the poem’s conclusion: “prate about an elephant not one of them has seen” in the HathiTrust digital library finds more than 140 matches in these places.

    Blind searching in large digital text repositories like the HathiTrust or Google Books provides an intriguing but incomplete view of the mass-digitized book corpus.  Frequently cited statistics like “12 million books” in GBS, “5 million books” or “one million public domain books” in Hathi don’t really tell us much about the anatomy of the mammoth.  Pat the elephant…what do you find?  A lot of curious sensory experiences that don’t add up.

    When it comes to anatomizing elephants, all parts are not created equal.  Georges Cuvier, who famously reconstructed skeletons on the basis of a tooth or a toe, knew this.  Cuvier confidently and correctly distinguished Indian and African elephant species based on characteristic differences in jawbones; he ‘discovered’ the woolly mammoth based on a close examination of incomplete fossil remains.

    I’m inclined to think that counting books (or volumes) is about as useful in characterizing the mass-digitized corpus as counting vertebrae in the catacombs.  It tells us something about how much is there, but not much about who, or what, is there.

    Happily, there is an abundance of bibliographic metadata describing the content from which the mass-digitized corpus was sourced that can be used (like a fossilized tooth or a toe) to assign some generic, or I suppose specific, characteristics to the elephant in the room.  Over the past year, OCLC Research has been working on a project with Hathi and some other interested libraries to begin characterizing the enormous, vaguely familiar (snake? spear? tree?) yet altogether revolutionary (woolly!) mammoth created through the digitization of legacy print collections.

    We’ve posted some empirical data on the subject and library distribution of titles in the Hathi digital repository here.  

    I think it provides a useful complement to the enchanting and progressively revealing fan-dance of class numbers here.

    More to come.

    Pick of the week: ATF 2 March 2010

    Saturday, March 6th, 2010 by Jim

    ATF banner

    Some of you may already be subscribers to Above The Fold (ATF) our weekly current awareness compilation and commentary. We just sent out the seventieth issue. Our objective in assembling the newsletter was to offer an information professional’s view of issues from outside our domain that were worth your consideration and related to library, archive and museum challenges. We selected items of interest likely to be beyond your normal reading sphere to help folks you look farther more often with less work.The selection and the commentary on the chosen articles would, we hoped, encourage some lateral thinking in our domain.

    The date above marked our seventieth weekly issue and ATF now has nearly 3100 subscribers. We decided that we’ll feature a chosen article each week here in hangingtogether. I’ve chosen this article to feature not because it’s outside our domain but because it shines such a light on the obstacles to change in the research library arena.

    E-Library Economics (full article here)

    Inside Higher Ed   â€˘  February 10, 2010

    The hard truth about hard copy. Recent studies suggest it might take up to 50 years, or two generations, before faculty in some disciplines will accept the predominance of digital resources over hard copy. But the economics may help to persuade them: estimates peg the cost of keeping a book on a shelf at a little over $4 a year, versus about 15 cents for a digital version.

    This is the most disheartening saga. I feel badly for my colleague, Suzanne Thorin, the university librarian at Syracuse who is being vilified for acknowledging that the research library in the contemporary academy cannot contribute to the central academic mission without dramatic changes to its traditional processes and services. Managing the local book collection as part of a broad national pattern of provision, particularly alongside the emerging digital aggregations of text, could give readers and researchers more and better than any local print inventory. I’m looking forward to seeing the report mentioned in the article authored by another colleague, Paul Courant, from the University of Michigan but will have to wait until sometime in April. The faster it’s available the better. Cost evidence in these discussions is largely absent. Read the comments to fully appreciate the bile that this topic can attract. (Michalko)

    See the rest of this ATF issue here.
    Subscribe to ATF here.
    Subscribe to the RSS feed of ATF here.

    Back issues are here.