Archive for the 'digitization' Category

On building community

Thursday, February 3rd, 2011 by Merrilee

Almost a year ago, we hosted Undue Diligence and released the “Well-intentioned practice for putting digitized collections of unpublished materials online” document (we just call it WIP). The point behind the event and the document (and all of the preparation that went into both!) was to contribute to the professional, community of practice that is emerging in archives and special collections around digitization of unpublished materials. In short, there is risk in digitizing materials that may be in copyright, but that risk should be balanced with the harm to scholarship and society inherent in not making collections fully accessible: act accordingly. Since being published, the WIP has been endorsed by professional organizations, academic and research library professionals, and scholarly communications officers. (You, too, may want to join esteemed company and endorse the WIP, and we welcome that!)

This week I found out that the Triangle Research Libraries Network (TRLN) (which includes Duke University, North Carolina Central University, North Carolina State University, and the University of North Carolina at Chapel Hill) has released an Intellectual Property Rights Strategy for Digitization of Modern Manuscript Collections and Archival Record Groups, which draws significantly on the WIP.

I believe the TRLN rights strategy document is a step forward from the WIP because it is a strategic statement of intention with institutions, collections, and intelligent library and archives professionals behind it. We’ve been looking to foster a community of practice, and with TRLN and other institutions formalizing their approaches, we are beginning to have just that.

Special thanks to Laura Clark Brown and company at UNC’s Wilson Library for all the work they’ve done to push forward in the real world, and to my up-for-anything partner in crime on all things special and digital, Ricky Erway. There are some times when I feel like the work we do makes a difference and this is one of those times!

Cloud-sourcing Research Collections: Managing Print in the Mass-digitized Library Environment

Thursday, January 6th, 2011 by Jim

The report of this project is now available. We blogged about it recently in the series of posts summarizing our major activities at year end.

The importance of shared print initiatives is growing and the Hathi Trust is poised to become an important element in the library infrastructure of the future. Their participant list now shows 52 institutional participants and three major consortia. It’s clear that mass digitization, the flip to electronic resources and space demands have resulted in a new view of the print collection in academic libraries. There is now motivated discussion among research libraries about how to construct a new system of services based on the digitized aggregation, local collections and shared storage repositories.

To quote my colleague, Lorcan Dempsey, “we are pleased that much of the empirical context for this discussion and quite a bit of intellectual leadership has come from OCLC Research work being done for the RLG Partnership. Constance Malpas has been leading this activity, and has been creating and supporting links between various community initiatives and relevant product areas in OCLC.

Her much-awaited report detailing the initial work that formed the basis for this activity has now appeared. It is likely to be quite influential in future planning activities.”

Even if this is not a core interest you should be familiar with the major findings. You’ll be pleased to see that the report has an excellent executive summary ;)

OCLC Research 2010: Well-Intentioned Practices

Wednesday, December 22nd, 2010 by Merrilee

As 2010 winds down, we are reflecting on what we’ve worked on or created in a mini blog series. You can see a run down of highlights here.

Is copyright making you blue
And you don’t know what to do
Take advantage of others’ tactics
And put in place Well-Intentioned Practice!

I want to give a shout out to the National Library of Australia for what has become an annual display of talent and imagination. Each year the staff performs for their holiday party, and they share with the rest of us on YouTube. The results are funny and toe-tapping. This year’s theme was Putting on the Writs,” an homage to the trials and tribulations of adhering to copyright law.

National Library of Australia. We feel your pain. And we’ve been moved to do something about it. In the US. For unpublished materials.

Following on the heels of Shifting Gears, we began to realize what a barrier copyright law presents to those working with unpublished materials. We convened an advisory group. We held an event. Out of this came a document called Well-intentioned practice for putting digitized collections of unpublished materials online (we call it WIP). WIP encourages institutions to take a risk management approach (rather than apply item by item assessment).

WIP has been a success, and has been endorsed by numerous organizations and individuals. And we’ve just learned that we’ll have a session focusing on Well intentioned practices at the Society of American Archivists meeting in 2011. While WIP is based on US copyright law, as a risk management approach it may work in other situations.

We’ve written about WIP in the past. Here are two previous posts on this topic.

And if you haven’t seen it, here’s Puttin’ on the Writs in its full glory.

If you want to see even more of our accomplishments look at this summary of our accomplishments over the last five years. Only three pages!

Pat the Elephant

Friday, July 23rd, 2010 by Constance

There is a well-known fable about blind men with contrasting views on the anatomy of an elephant, each having examined a separate piece of the beast and independently concluded that it is either very like a spear, or a fan, or a snake, etc.  Even in combination their observations fail to provide a very good picture of what an elephant looks like as a whole.  The story was popularized in a poem by John Godfrey Saxe which is cited in a surprisingly wide variety of publications, from early childhood education manuals, to scientific and medical reports, to vocational guides and, more predictably, collections of 19C verse.  I know this because a search on a distinctive phrase from the poem’s conclusion: “prate about an elephant not one of them has seen” in the HathiTrust digital library finds more than 140 matches in these places.

Blind searching in large digital text repositories like the HathiTrust or Google Books provides an intriguing but incomplete view of the mass-digitized book corpus.  Frequently cited statistics like “12 million books” in GBS, “5 million books” or “one million public domain books” in Hathi don’t really tell us much about the anatomy of the mammoth.  Pat the elephant…what do you find?  A lot of curious sensory experiences that don’t add up.

When it comes to anatomizing elephants, all parts are not created equal.  Georges Cuvier, who famously reconstructed skeletons on the basis of a tooth or a toe, knew this.  Cuvier confidently and correctly distinguished Indian and African elephant species based on characteristic differences in jawbones; he ‘discovered’ the woolly mammoth based on a close examination of incomplete fossil remains.

I’m inclined to think that counting books (or volumes) is about as useful in characterizing the mass-digitized corpus as counting vertebrae in the catacombs.  It tells us something about how much is there, but not much about who, or what, is there.

Happily, there is an abundance of bibliographic metadata describing the content from which the mass-digitized corpus was sourced that can be used (like a fossilized tooth or a toe) to assign some generic, or I suppose specific, characteristics to the elephant in the room.  Over the past year, OCLC Research has been working on a project with Hathi and some other interested libraries to begin characterizing the enormous, vaguely familiar (snake? spear? tree?) yet altogether revolutionary (woolly!) mammoth created through the digitization of legacy print collections.

We’ve posted some empirical data on the subject and library distribution of titles in the Hathi digital repository here.  

I think it provides a useful complement to the enchanting and progressively revealing fan-dance of class numbers here.

More to come.

Focus and reframe: rights and unpublished materials

Wednesday, March 10th, 2010 by Merrilee

I’m using this blog posting to wrap together a bunch of ideas I’ll be presenting at a meeting tomorrow, Undue Diligence: Seeking Low-risk Strategies for Making Collections of Unpublished Materials More Accessible.

Mark Greene and Dennis Meissner helped to reframe processing modern archival collections in More Product, Less Process. Similarly, Shifting Gears helped to recast digitization from special collections. The purpose of Undue Diligence is to help professionals to look anew at rights issues around unpublished materials, specifically with regard to digitization of those materials, particularly 20th and 21st century collections.

The RLG Partnership exists to identify shared problems spaces, and to reduce pain and effort in those areas. With increasing expectations that our holdings will be made digitally accessible, assessing rights (copyright, along with privacy rights, and potentially sensitive materials) within archival collections is one of those points of pain. The prospect of analyzing items within archival collections is so painful, in fact, that many institutions avoid digitizing collections that were created in the last 70 to 100 years. While this is a very safe practice, it does little to advance broad and democratic access to collections in our care.

The RLG Partnership likewise dodged the copyright bullet in 2007 when we held our forum, Digitization Matters (from which Shifting Gears was born). We ruled copyright out of scope. While reframing the conversation around digitization — from preservation to access, from quality to quantity — did help move the conversation on digitization forward, it did little for those institutions who have major collections relating to … the Great Depression, World Wars I and II, the Korean, Vietnam, and Gulf wars, the civil rights movement, the free speech movement… the list goes on and on. This is a small slice of topics that are studied by researchers, taught in classrooms, and of interest to citizens everywhere.

In 2008, we published a short paper called Copyright Investigation Summary Report, which looked at then-current practices around copyright with both published and unpublished materials. Here, we learned that most investigations related to copyright were in relationship to permissions and almost never to digitization. Work was high effort and low return. “We say no a lot,” said one interviewee. Having conducted the interviews, I was pretty depressed by what I heard, which was a tale of professionals paralyzed by potential risks, and of collections shackled.

One of the proposed outcomes of the paper was to “…further explore community practice and issues around unpublished materials held in special collections and archives.” We did so by sponsoring the meeting that lead to the SAA Orphan Works Statement of Best Practices, which was published in 2009. This document provides good guidance for institutions to conduct a “reasonable search,” but does not frame rights assessment in a risk management strategy.

The risk of perceived harm in digitizing a collection is quite variable, based on factors like content, purpose of creation, and date of creation. We believe, in addition to standards for conducting a reasonable search, the community needs to reframe the issues of rights and risks as a community, and also to embrace rights assessment as archivists: at a collection or series level and not at an item level.

We are holding this event, with a star studded cast of presenters, to help set the stage for an important conversation, which is the development of what we are calling a set of “well intentioned practices.” We hope that this will have two effects. The first is that archivists will not need to reinvent the wheel, and can draw from community practices to identify lower risk collections of high research interest. The second is that institutions will digitize collections more freely. Even if institutions consider digitizing two out of ten collections, as opposed to one out of ten collections, access to collections will double!

We will follow up with subsequent blog postings both to report on the content of Undue Diligence and also to report on outcomes.

Many thanks to the advisory group who both helped to shape this event and our program of work in this area.

If you wish to follow the event on Twitter, follow #UndueD. I’ve also set up a Twapper Keeper for the event.

Europeana at the Halfway Mark

Monday, December 7th, 2009 by Ricky

For the recent LIBER/EBLIDA workshop on digitization at the Koninklijke Bibliotheek in The Hague, I was asked to provide a view on Europeana from the US perspective. Of course, I neither speak for the US nor do I have inside information about Europeana, but I’d been following it from afar and had read just about everything I could get my hands on, so I gamely took the challenge. [Only someone as bloodied by digital paper cuts as I would dare to take on Europeana.] I wasn’t bombarded with rotten tomates, courgettes, and aubergines, so I guess it went OK. My remarks are now available in Volume 19 (2009), No. 2 of the LIBER QUARTERLY.