Archive for the 'Managing the Collective Collection' Category

Sorting out Demand: some thoughts on library inter-lending

Friday, July 30th, 2010 by Constance

Over the past few years, OCLC Research has done quite a bit of analytic work based on what my colleague Brian Lavoie refers to as “supply-side” data. Examples include the well-known Google 5 study, as well as a variety of projects examining the library long tail, several of them summarized in an article Lorcan published some time ago. Much of this work has been based on data aggregated in the WorldCat bibliographic database. These data have been contributed over many years by OCLC members to support a variety of shared library services, including cooperative cataloging and inter-lending operations; as a secondary effect, the aggregation has provided a rich source of information about the system-wide library collection that is regularly mined in both internal and extra-mural research projects.

More recently, we have begun to think about how we might make better use of the demand-side data that is generated by a variety of routine library operations, especially circulation and inter-lending.  Lorcan in particular has given thought to how “intentional data” might usefully shape library service provision.

Inter-library loan transactions are a particularly interesting example of intentional data, I think. Read the rest of this entry »

Pat the Elephant

Friday, July 23rd, 2010 by Constance

There is a well-known fable about blind men with contrasting views on the anatomy of an elephant, each having examined a separate piece of the beast and independently concluded that it is either very like a spear, or a fan, or a snake, etc.  Even in combination their observations fail to provide a very good picture of what an elephant looks like as a whole.  The story was popularized in a poem by John Godfrey Saxe which is cited in a surprisingly wide variety of publications, from early childhood education manuals, to scientific and medical reports, to vocational guides and, more predictably, collections of 19C verse.  I know this because a search on a distinctive phrase from the poem’s conclusion: “prate about an elephant not one of them has seen” in the HathiTrust digital library finds more than 140 matches in these places.

Blind searching in large digital text repositories like the HathiTrust or Google Books provides an intriguing but incomplete view of the mass-digitized book corpus.  Frequently cited statistics like “12 million books” in GBS, “5 million books” or “one million public domain books” in Hathi don’t really tell us much about the anatomy of the mammoth.  Pat the elephant…what do you find?  A lot of curious sensory experiences that don’t add up.

When it comes to anatomizing elephants, all parts are not created equal.  Georges Cuvier, who famously reconstructed skeletons on the basis of a tooth or a toe, knew this.  Cuvier confidently and correctly distinguished Indian and African elephant species based on characteristic differences in jawbones; he ‘discovered’ the woolly mammoth based on a close examination of incomplete fossil remains.

I’m inclined to think that counting books (or volumes) is about as useful in characterizing the mass-digitized corpus as counting vertebrae in the catacombs.  It tells us something about how much is there, but not much about who, or what, is there.

Happily, there is an abundance of bibliographic metadata describing the content from which the mass-digitized corpus was sourced that can be used (like a fossilized tooth or a toe) to assign some generic, or I suppose specific, characteristics to the elephant in the room.  Over the past year, OCLC Research has been working on a project with Hathi and some other interested libraries to begin characterizing the enormous, vaguely familiar (snake? spear? tree?) yet altogether revolutionary (woolly!) mammoth created through the digitization of legacy print collections.

We’ve posted some empirical data on the subject and library distribution of titles in the Hathi digital repository here.  

I think it provides a useful complement to the enchanting and progressively revealing fan-dance of class numbers here.

More to come.

Pick of the week: ATF 2 March 2010

Saturday, March 6th, 2010 by Jim

ATF banner

Some of you may already be subscribers to Above The Fold (ATF) our weekly current awareness compilation and commentary. We just sent out the seventieth issue. Our objective in assembling the newsletter was to offer an information professional’s view of issues from outside our domain that were worth your consideration and related to library, archive and museum challenges. We selected items of interest likely to be beyond your normal reading sphere to help folks you look farther more often with less work.The selection and the commentary on the chosen articles would, we hoped, encourage some lateral thinking in our domain.

The date above marked our seventieth weekly issue and ATF now has nearly 3100 subscribers. We decided that we’ll feature a chosen article each week here in hangingtogether. I’ve chosen this article to feature not because it’s outside our domain but because it shines such a light on the obstacles to change in the research library arena.

E-Library Economics (full article here)

Inside Higher Ed   •  February 10, 2010

The hard truth about hard copy. Recent studies suggest it might take up to 50 years, or two generations, before faculty in some disciplines will accept the predominance of digital resources over hard copy. But the economics may help to persuade them: estimates peg the cost of keeping a book on a shelf at a little over $4 a year, versus about 15 cents for a digital version.

This is the most disheartening saga. I feel badly for my colleague, Suzanne Thorin, the university librarian at Syracuse who is being vilified for acknowledging that the research library in the contemporary academy cannot contribute to the central academic mission without dramatic changes to its traditional processes and services. Managing the local book collection as part of a broad national pattern of provision, particularly alongside the emerging digital aggregations of text, could give readers and researchers more and better than any local print inventory. I’m looking forward to seeing the report mentioned in the article authored by another colleague, Paul Courant, from the University of Michigan but will have to wait until sometime in April. The faster it’s available the better. Cost evidence in these discussions is largely absent. Read the comments to fully appreciate the bile that this topic can attract. (Michalko)

See the rest of this ATF issue here.
Subscribe to ATF here.
Subscribe to the RSS feed of ATF here.

Back issues are here.

Museum Data Exchange - Report Executive Summary

Friday, January 15th, 2010 by Günter

The final report of the Museum Data Exchange grant will be released on the OCLC Research website later this month. As a first impression of key outcomes, I’ve posted the executive summary below. Stay tuned!

*********

The Museum Data Exchange, funded by the Andrew W. Mellon Foundation, brought together a group of nine museums and OCLC Research to create tools for data sharing, build a research aggregation and analyze the aggregation. The project established infrastructure for standards-based metadata exchange for the museum community and modeled data sharing behavior among participating institutions.

Tools
The tools created by the project allow museums to share standards-based data using the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH).

  • COBOAT allows museums to extract Categories for the Description of Works of Art (CDWA) Lite XML out of collections management systems
  • OAICatMuseum 1.0 makes the data harvestable via OAI-PMH
  • COBOAT’s default configuration targets Gallery Systems’ TMS, but can be adjusted to work with other vendor-based or homegrown database systems.

    Both tools are a free download from here.
    Configuration files adapting COBOAT to different systems can be shared here.
    Read the rest of this entry »

    The Cult of Brewster Finds Its Church

    Tuesday, October 20th, 2009 by Roy

    The Internet Archive's New HomeLast night Brewster Kahle of the Internet Archive unveiled his latest project in a venue suitable for any high priest or cult leader — a former Christian Science Church in San Francisco. As it turns out, the Internet Archive recently purchased the building, and as Brewster remarked during the grand unveiling of the Bookserver project, it even matches their long-time logo, which was selected on purpose to imply a physical library.

    Although the mood in the great room of the church that eventually Brewster hopes to turn into a modern-day library reading room was more hallelujah-inspiring than anything, the day preceding had been more down-and-dirty technical. The two-day meeting (still going on as I write this), is more about AtomPub and identifiers than holy water and consecrated wafers, but all of it does take a certain amount of faith. Read the rest of this entry »

    Going Beyond: The Silos of the LAMs in the UK

    Tuesday, August 25th, 2009 by Günter

    After successfully wrapping up a series of panel presentations at ALA, SAA and AAM, we’re now taking our LAMs to the UK. CILIP asked us to create a day-long event around library, archive and museum collaboration. Internally, we’ve code-named this event “Beyond ‘Beyond the Silos of the LAMs,’” since we’re using our report [pdf] as a launch-pad for presenters and presentations going beyond our initial investigation. To the world, the event is known (without the stutter) as “Beyond the Silos of the LAMs”, and it’ll be held on September 15th in London. It’s not too late to register!
    Read the rest of this entry »

    The Smithsonian Challenge - Dr Wayne Clough @ SALT

    Wednesday, August 19th, 2009 by Günter

    Steward Brand and Wayne CloughEarlier this week, I heard Dr Wayne Clough, Secretary of the Smithsonian Institution, speak as part of the Long Now’s Seminars About Long Term Thinking (SALT) series. In his talk, he focused primarily on a part of the Smithsonian I confess I know a lot less about than its plethora of libraries, archives and museums: the Smithsonian’s science centers and the scientific work throughout the institution. Did you know that apart from all of those buildings on the mall, the Smithsonian maintains numerous research centers with activities in 88 countries, or that every 6th Smithsonian employee is working in astronomy? Or that the Smithsonian tends the longest scientifically observed plot of earth (a slice of rain forest in Panama, which it has researched for the last 100 years)? I didn’t, and I walked away newly impressed with the breadth and scope of Smithsonian engagement in science, and in particular its contributions to our knowledge about global warming.

    In the q&a, some of the question focused on what you might call more traditional “museum” concerns. A question about deaccessioning of materials triggered an interesting exchange between Clough and Steward Brand, the host of the lecture series. When Clough stated that the Smithsonian won’t duplicate collections at other museums, Brand followed up: “You have some network knowledge of what’s in all the museums of the world?” When Clough affirmed, Brand wanted to know: “Can we have access to that?”

    Of course, when Clough affirmed, the network he was talking about was the professional network among curators, as well as the published literature, which allowed the Smithsonian to know what other institutions collect. What Brand got intrigued by, however, was the idea that there might be a database system representing museum collections across the globe which the public might gain access to. Of course, such a database does not yet exist. It’s difficult to refrain from speculating how much inefficiency is built into museum practice because we lack such a resource.
    Read the rest of this entry »

    Smithsonian Web Strategy, CultureLabel: The Impact of Network Effects

    Friday, July 31st, 2009 by Günter

    The Smithonian just announced the release of its Web and New Media Strategy v 1.0 [pdf], which has come together swiftly in a process of marvelous openness and inclusion. As a campus-like institution with 19 museums and galleries, 9 research centers, 18 archives, 1 library with 20 branches, and a zoo, the Smithsonian web-presence to date is as fragmented as its administrative parts (also see this presentation), and the chief goal of the web strategy is to offer the Smithsonian Commons as a unifying platform to SI units.

    The initial Smithsonian Commons will be a Web site […] featuring collections of digital assets contributed voluntarily by the units and presented through a platform that provides best-of-class search and navigation; social tools such as commenting, recommending, tagging, collecting, and sharing; and intellectual property permissions that clearly give users the right to use, re-use, share, and innovate with our content without unnecessary restrictions.

    Starting to skim through the report, this line in particular caught my attention:

    We are like a retail chain that has desirable and unique merchandise but requires its customers to adapt to dramatically different or outdated idioms of signage, product availability, pricing, and check-out in every aisle of each store.

    I think this is an apt metaphor for how the Smithsonian currently undermines its own potential, and should serve as a memorable rallying cry for the changes the web strategy advocates.

    As coincidence would have it, this metaphor also handsomely dovetails with another intriguing piece of news, gleaned from the UK Museum Computer Group list (posted by Simon Cronshaw, Director of CultureLabel):

    If you haven’t come across CultureLabel yet, our aim is to facilitate a united alliance of museum e-stores to forge a new mainstream consumer shopping category of ‘cultural shopping’ - in a similar way to how ethical shopping or alternative gifts have crystallised as buying categories in the public consciousness. We see this as a great new opportunity for both income generation and innovative audience development for all our culture partners.

    While the Smithsonian aims to integrate its digital collection into a more cohesive webpresence, CultureLabel aims to integrate museum e-stores (for starters, those in the UK - more here) into one massive one-stop shop. What’s true for digital collections is equally true for products from the museum store: bringing together assets from a wide variety of players creates a webpresence with more gravity, which in turn will attract a wider audience. The Smithsonian Commons and CultureLabel both take advantage of a fundamental network effect: the more assets, the more users (customers / site visitors); the more users, the more participation (purchasing / tagging, commenting, etc.). The brand, a term featuring prominently both in the SI Web Strategy and on the CultureLabel website, ultimately is the biggest winner.

    The Smithsonian web strategy acknowledges that the fragmented offering severely limits the impact pan-institutional assets currently have. Taking a step back, of course this logic also applies to the larger community: fragmenting our offerings into thousands of institutional websites severely limits the impact and potential of the collective museum collection.

    With 60 participating museums and galleries, CultureLabel breaks down those institutional barriers, and stands as one of the most extensive data sharing exercise museums have engaged in to date. It’s a little sobering, if not surprising, that the gift shop is ahead of the collection in this instance. Can we do for museum collections what CultureLabel has done for museum commerce? Can we scale the model and the values of the Smithsonian Commons to a Commons for all museums? If it works for products, let’s make it work for digital collections.

    Impact Measures and Library Selection

    Thursday, May 14th, 2009 by Constance

    I have just been reading a recent article by Kathy Enger* published in Library & Information Science Research that examines the potential value of citation analysis as a selection tool in academic library acquisitions. Enger proposes that citation analysis of the journal literature might be used to identify potentially high-impact books for inclusion in a college or university library collection. The reasoning here is quite interesting: based on the observation that humanities and social science scholars rely more heavily on monographs than journals as a vehicle of scholarly communication, a sampling method is used to identify high impact journals in the social sciences and then cull from these the top cited authors. If these authors have also published books not already represented in the local collection, the titles are acquired on the premise that the content is likely to represent ‘high value’ scholarship. Library circulation figures are later examined to determine if these titles are used (borrowed) more frequently than titles selected through traditional means.

    This seems like a proposition worth testing. Read the rest of this entry »

    An open Smithsonian, all around

    Monday, May 11th, 2009 by Günter

    As part of the process for arriving at the Smithsonian’s Web and New Media strategic plan, Michael Edson created a Wiki on which Smithsonian staff discuss their points of view in plain site of anybody who is interested in listening in. This experiment in radical transparency is in and of itself noteworthy, and so is the content which surfaces on the Wiki. Encouraged by @mpedson’s tweet, I particularly took note of two short talks arguing in favor of open access to museum content. The first paper (titled “Publish Everything!”) is by Betsy Broun (Director, Smithsonian American Art Museum); the second paper (titled “Make Content Freely Available”) is by Lauryn Guttenplan (Associate General Counsel at the Smithsonian). Both papers were presented as part of the Smithsonian 2.0 Forum on April 21, 2009. One reason why I found these notes remarkable is because those who are speaking represent the class of professional who oftentimes is perceived to be scuttling plans for making data more openly available – not in this instance!

    Here are the outtakes I would have marked yellow if I had actually printed the pieces instead of saving a tree and reading online.
    Read the rest of this entry »