Archive for the 'Managing the Collective Collection' Category

The Empires Fight Back – Globalization and Area Studies

Tuesday, June 14th, 2011 by Jennifer

Last week at our FutureCast meeting, Deborah Jakubs, University Librarian at Duke, gave us a thoughtful analysis of internationalizing education and research collections. She was commenting on Ben Wildavsky’s talk about an increasingly mobile academy, the emergence of global universities, and the role of global rankings. Deborah  put Wildavsky’s thesis about globalization and higher education in a research library context. I asked Deborah for her notes, and she has allowed me to post them. I have made my own personal selections here.

Issues:

  • Higher education has gone global
  • Language learning/fluency very important
  • Increased collaboration with research partners, co-authors,  beyond the US
  • Access for non-US researchers to scholarship produced in the US and internationally

Ironies:

  • Title VI funding for area studies is threatened precisely when language/cultural expertise is needed
  • Research libraries see continued decline in “foreign acquisitions”
  • Trend in libraries to justify expenditures on use, ROI
  • Limited and/or uneven production of and access to digital scholarly resources worldwide
  • Contradiction between globalized universities and diminishing focus on global acquisitions
  • How will needs of scholars for access to non-English, often obscure, materials be met?
  • Erosion of the mission of research libraries to focus on the most-used or most-requested, turning away from more specialized
  • Implications of just in time vs. just in case for foreign materials?

Challenges:

  • Focus more on less available materials;  “core” is easily found (see Hathi Trust, etc.)
  • Treat foreign materials as special collections
  • What’s the information landscape beyond the US, in developing countries?
  • Can we develop centers of strength?
  • Given the partnerships between US and non-US researchers/institutions, we should develop parallel partnership with libraries in other countries

It will come as no surprise to many that Deborah is on the task force on International Engagement of ARL Libraries.

The video recordings of the FutureCast plenary sessions and response panels will be posted shortly.

The Scottish Presence in the Global Library Resource

Wednesday, March 23rd, 2011 by Brian

“Collective collections” are the combined library collections of multiple institutions. They may exist as a physical aggregation of materials at a single location; they may exist through a service layer that integrates distinct collections into a single resource. Or they may be only notional: a hypothetical combination that can be mined for intelligence to inform institutional and collaborative decision-making. Collective collections can be assembled at any scale, from two institutions to the global library system as a whole. In the latter case, of course, we can only approximate – no single data source represents the holdings of all libraries everywhere. Fortunately, the more than 200 million records and 1.7 billion holdings contained in the WorldCat database provide us with an approximation of the global library resource that is sufficient to explore many interesting questions.

A project currently underway in OCLC Research is exploring the concept of a national presence in the global library resource. A national presence can be characterized from a number of perspectives, including the distinctive features of the country’s library collections; the output of the country’s publishing houses; works authored by the country’s citizens; and the corpus of materials that, regardless of origin, are “about” some aspect of that country. All of these facets, taken together, form a picture of how a country’s profile is manifested in library collections around the world. The project uses Scotland as a case study to illustrate the concept of a national presence in the global library resource, but the goal is to develop patterns of analysis that can be applied without significant modification to any country.

The project is investigating three major themes:

  • National Research Collection: the project examines the notion of a “national research collection” – the combined library collections of a nation’s higher education and research-oriented institutions – and how it aligns with the global library resource. The analysis focuses on the collective collection of the four ancient Scottish universities (Aberdeen, Edinburgh, Glasgow, and St. Andrews) as well as the National Library of Scotland. The purpose is to uncover the distinctive features of this collective research collection vis-à-vis groups of peer institutions in the library system (e.g., the collective holdings of ARL institutions), as well as the library system as a whole (as represented by WorldCat).
  • National Presence in the Global Library Resource: this aspect of the project shifts focus from the contents of Scottish library collections to the presence of Scotland-related materials in library collections around the world. The analysis focuses on materials published in Scotland, created by Scottish authors, or primarily about some aspect of Scotland. Key questions include the size of the Scottish national presence in the global library resource, as well as the characteristics of the materials comprising this presence.
  • Diffusion of a National Presence within the Global Library Resource: given the materials identified as comprising a national presence, WorldCat holdings data can be used to track their pattern of diffusion throughout the global library resource. From this, many interesting questions can be explored: the locations of extensive collections of Scotland-related materials outside Scotland; comparisons of the diffusion of Scotland-related materials with the diffusion of the “Scottish diaspora”; and global collecting activity as a means of identifying “core” (i.e., particularly influential) Scottish works.

Analysis of a national presence in the global library resource is relevant to a range of library decision-making needs, including collection development strategies, prioritization of digitization activities, “gap analysis” for national library collections, as well as other applications. A key source of value in all of these potential uses is the ability to consider the features of a national presence against the broader context of the global library resource. The capacity to frame collections and services within a system-wide perspective is a tool of growing importance for library-related analysis and decision-making.

Note: Thanks to colleagues at the Universities of Aberdeen, Edinburgh, Glasgow, and St. Andrews, and the National Library of Scotland, for their ongoing participation in this work. Special thanks to our much-missed colleague John MacColl, who, in his former role as an OCLC Research program officer, was instrumental in designing and guiding this project. We look forward to his continued participation in the project in his new role as University Librarian and Director of Library Services at the University of St. Andrews!

Cloud-sourcing Research Collections: Managing Print in the Mass-digitized Library Environment

Thursday, January 6th, 2011 by Jim

The report of this project is now available. We blogged about it recently in the series of posts summarizing our major activities at year end.

The importance of shared print initiatives is growing and the Hathi Trust is poised to become an important element in the library infrastructure of the future. Their participant list now shows 52 institutional participants and three major consortia. It’s clear that mass digitization, the flip to electronic resources and space demands have resulted in a new view of the print collection in academic libraries. There is now motivated discussion among research libraries about how to construct a new system of services based on the digitized aggregation, local collections and shared storage repositories.

To quote my colleague, Lorcan Dempsey, “we are pleased that much of the empirical context for this discussion and quite a bit of intellectual leadership has come from OCLC Research work being done for the RLG Partnership. Constance Malpas has been leading this activity, and has been creating and supporting links between various community initiatives and relevant product areas in OCLC.

Her much-awaited report detailing the initial work that formed the basis for this activity has now appeared. It is likely to be quite influential in future planning activities.”

Even if this is not a core interest you should be familiar with the major findings. You’ll be pleased to see that the report has an excellent executive summary ;)

OCLC Research 2010 – Cloud Library

Thursday, December 23rd, 2010 by Jim

The Cloud Library project (see the exposition that follows for a quick reductive overview of the idea) got a lot of attention and had a big impact in the research library community this past year.

Dark foreground and clouds, mountains highlighted, "Heaven's Peak," Glacier National Park, Montana.

My colleague, Constance Malpas, is the principal intellectual engine driving this effort. She’s shaped the opportunities, divined the evidence to support first steps and generally been a tireless participant in the discussions and action planning that have sprung up around this opportunity. She’s busy now finalizing a report which will be available in January, 2011. We’ll blog about its release.

What’s the idea? In the same way that cloud computing offers resources and applications on demand without the user having to operate and own the underlying assets, the cloud library project posited that it is now possible for academic libraries to rely on access to needed book and journal assets rather than manage them as locally-resident and managed physical items.

The entry point for exploring this possibility was to reconsider the relation between a library’s physical book collection, off-site storage repositories, and emerging digital text aggregations. Could a library change its local print inventory by relying on supply of a digital version of the text from a digital aggregator and offer a print volume when necessary through an arrangement with an existing storage facility(ies)? We found willing exemplars of each player – New York University as a customer, ReCAP as a storage facility willing to supply and the Hathi Trust as a digital text aggregator willing to offer access to an electronic version of the book.

Constance examined the overlap between NYU’s collection, the holdings of ReCAP and the rapidly growing database of digital texts being built by the Hathi Trust from the digital copies received by participants in the Google Books Library Project and other digital copies of books. It was tough to find out what was in ReCAP as those holdings haven’t been transmitted to OCLC and it was difficult to keep up with the Hathi Trust as the corpus was growing so rapidly. Nevertheless the analyses were done and have been repeated on a regular basis.

The results of this initial effort have been discussed in many forums. A nice summary by Constance is in this presentation from the RLG Partnership Annual Meeting 10 June 2010. The key findings were that 30% or more of the local collection was already present in the digital aggregation, that 75% of the mass digitized texts were already ‘backed up’ in one or more of the storage repositories and that, unfortunately, only a very small percentage of the intersection was in the public domain. Since the Google Book settlement was delayed this last point meant that there was no way to use the Hathi Trust as more than a digital preservation structure. Nevertheless even without an e-book delivery opportunity the amount of the library space that could be freed up by a willingness to responsibly (i.e. with contractual understanding, with proper reassurances about preservation, with Hathi acting as another preservation format and with appropriate coordination across multiple repositories willing to supply) rely on delivery from remote sources is quite large. In the NYU case it meant more than 700,000 volumes could be relegated and managed differently.

This finding prompted further work to look at other ARL library collections. The fascinating and hopefully motivating finding was that the percent duplication obtained across nearly the entire ARL group. That is, more than 30% of most library’s collections were already duplicated in the mass digitized aggregation and this was expected to grow to 50% over the next 3 years. This means that most ARLs would benefit in similarly substantial ways by moving from reliance on their local print collections to reliance on storage repository supply. Constance gave a very nice presentation(ppt) on these findings along with recommendations about where change could start and things to stop doing at the October 2010 ARL meeting.

The overwhelming evidence that this project has assembled coupled with extraordinary budget pressures have resulted in genuine action plans at individual research libraries as well as plans for group responses. Requirements for the necessary shared infrastructure are being articulated and pilot efforts are being launched. The ‘cloud library’ seeded by OCLC Research efforts is precipitating change. (Block that metaphor.)

Sorting out Demand: some thoughts on library inter-lending

Friday, July 30th, 2010 by Constance

Over the past few years, OCLC Research has done quite a bit of analytic work based on what my colleague Brian Lavoie refers to as “supply-side” data. Examples include the well-known Google 5 study, as well as a variety of projects examining the library long tail, several of them summarized in an article Lorcan published some time ago. Much of this work has been based on data aggregated in the WorldCat bibliographic database. These data have been contributed over many years by OCLC members to support a variety of shared library services, including cooperative cataloging and inter-lending operations; as a secondary effect, the aggregation has provided a rich source of information about the system-wide library collection that is regularly mined in both internal and extra-mural research projects.

More recently, we have begun to think about how we might make better use of the demand-side data that is generated by a variety of routine library operations, especially circulation and inter-lending.  Lorcan in particular has given thought to how “intentional data” might usefully shape library service provision.

Inter-library loan transactions are a particularly interesting example of intentional data, I think. Read the rest of this entry »

Pat the Elephant

Friday, July 23rd, 2010 by Constance

There is a well-known fable about blind men with contrasting views on the anatomy of an elephant, each having examined a separate piece of the beast and independently concluded that it is either very like a spear, or a fan, or a snake, etc.  Even in combination their observations fail to provide a very good picture of what an elephant looks like as a whole.  The story was popularized in a poem by John Godfrey Saxe which is cited in a surprisingly wide variety of publications, from early childhood education manuals, to scientific and medical reports, to vocational guides and, more predictably, collections of 19C verse.  I know this because a search on a distinctive phrase from the poem’s conclusion: “prate about an elephant not one of them has seen” in the HathiTrust digital library finds more than 140 matches in these places.

Blind searching in large digital text repositories like the HathiTrust or Google Books provides an intriguing but incomplete view of the mass-digitized book corpus.  Frequently cited statistics like “12 million books” in GBS, “5 million books” or “one million public domain books” in Hathi don’t really tell us much about the anatomy of the mammoth.  Pat the elephant…what do you find?  A lot of curious sensory experiences that don’t add up.

When it comes to anatomizing elephants, all parts are not created equal.  Georges Cuvier, who famously reconstructed skeletons on the basis of a tooth or a toe, knew this.  Cuvier confidently and correctly distinguished Indian and African elephant species based on characteristic differences in jawbones; he ‘discovered’ the woolly mammoth based on a close examination of incomplete fossil remains.

I’m inclined to think that counting books (or volumes) is about as useful in characterizing the mass-digitized corpus as counting vertebrae in the catacombs.  It tells us something about how much is there, but not much about who, or what, is there.

Happily, there is an abundance of bibliographic metadata describing the content from which the mass-digitized corpus was sourced that can be used (like a fossilized tooth or a toe) to assign some generic, or I suppose specific, characteristics to the elephant in the room.  Over the past year, OCLC Research has been working on a project with Hathi and some other interested libraries to begin characterizing the enormous, vaguely familiar (snake? spear? tree?) yet altogether revolutionary (woolly!) mammoth created through the digitization of legacy print collections.

We’ve posted some empirical data on the subject and library distribution of titles in the Hathi digital repository here.  

I think it provides a useful complement to the enchanting and progressively revealing fan-dance of class numbers here.

More to come.

Pick of the week: ATF 2 March 2010

Saturday, March 6th, 2010 by Jim

ATF banner

Some of you may already be subscribers to Above The Fold (ATF) our weekly current awareness compilation and commentary. We just sent out the seventieth issue. Our objective in assembling the newsletter was to offer an information professional’s view of issues from outside our domain that were worth your consideration and related to library, archive and museum challenges. We selected items of interest likely to be beyond your normal reading sphere to help folks you look farther more often with less work.The selection and the commentary on the chosen articles would, we hoped, encourage some lateral thinking in our domain.

The date above marked our seventieth weekly issue and ATF now has nearly 3100 subscribers. We decided that we’ll feature a chosen article each week here in hangingtogether. I’ve chosen this article to feature not because it’s outside our domain but because it shines such a light on the obstacles to change in the research library arena.

E-Library Economics (full article here)

Inside Higher Ed   •  February 10, 2010

The hard truth about hard copy. Recent studies suggest it might take up to 50 years, or two generations, before faculty in some disciplines will accept the predominance of digital resources over hard copy. But the economics may help to persuade them: estimates peg the cost of keeping a book on a shelf at a little over $4 a year, versus about 15 cents for a digital version.

This is the most disheartening saga. I feel badly for my colleague, Suzanne Thorin, the university librarian at Syracuse who is being vilified for acknowledging that the research library in the contemporary academy cannot contribute to the central academic mission without dramatic changes to its traditional processes and services. Managing the local book collection as part of a broad national pattern of provision, particularly alongside the emerging digital aggregations of text, could give readers and researchers more and better than any local print inventory. I’m looking forward to seeing the report mentioned in the article authored by another colleague, Paul Courant, from the University of Michigan but will have to wait until sometime in April. The faster it’s available the better. Cost evidence in these discussions is largely absent. Read the comments to fully appreciate the bile that this topic can attract. (Michalko)

See the rest of this ATF issue here.
Subscribe to ATF here.
Subscribe to the RSS feed of ATF here.

Back issues are here.

Museum Data Exchange – Report Executive Summary

Friday, January 15th, 2010 by Günter

The final report of the Museum Data Exchange grant will be released on the OCLC Research website later this month. As a first impression of key outcomes, I’ve posted the executive summary below. Stay tuned!

*********

The Museum Data Exchange, funded by the Andrew W. Mellon Foundation, brought together a group of nine museums and OCLC Research to create tools for data sharing, build a research aggregation and analyze the aggregation. The project established infrastructure for standards-based metadata exchange for the museum community and modeled data sharing behavior among participating institutions.

Tools
The tools created by the project allow museums to share standards-based data using the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH).

  • COBOAT allows museums to extract Categories for the Description of Works of Art (CDWA) Lite XML out of collections management systems
  • OAICatMuseum 1.0 makes the data harvestable via OAI-PMH
  • COBOAT’s default configuration targets Gallery Systems’ TMS, but can be adjusted to work with other vendor-based or homegrown database systems.

    Both tools are a free download from here.
    Configuration files adapting COBOAT to different systems can be shared here.
    Read the rest of this entry »

    The Cult of Brewster Finds Its Church

    Tuesday, October 20th, 2009 by Roy

    The Internet Archive's New HomeLast night Brewster Kahle of the Internet Archive unveiled his latest project in a venue suitable for any high priest or cult leader — a former Christian Science Church in San Francisco. As it turns out, the Internet Archive recently purchased the building, and as Brewster remarked during the grand unveiling of the Bookserver project, it even matches their long-time logo, which was selected on purpose to imply a physical library.

    Although the mood in the great room of the church that eventually Brewster hopes to turn into a modern-day library reading room was more hallelujah-inspiring than anything, the day preceding had been more down-and-dirty technical. The two-day meeting (still going on as I write this), is more about AtomPub and identifiers than holy water and consecrated wafers, but all of it does take a certain amount of faith. Read the rest of this entry »

    Going Beyond: The Silos of the LAMs in the UK

    Tuesday, August 25th, 2009 by Günter

    After successfully wrapping up a series of panel presentations at ALA, SAA and AAM, we’re now taking our LAMs to the UK. CILIP asked us to create a day-long event around library, archive and museum collaboration. Internally, we’ve code-named this event “Beyond ‘Beyond the Silos of the LAMs,’” since we’re using our report [pdf] as a launch-pad for presenters and presentations going beyond our initial investigation. To the world, the event is known (without the stutter) as “Beyond the Silos of the LAMs”, and it’ll be held on September 15th in London. It’s not too late to register!
    Read the rest of this entry »