Archive for the 'digitization' Category

Past Forward: New ways forward

Tuesday, July 23rd, 2013 by Merrilee

This is the second in a series on the OCLC Research Library Partnership meeting, Past Forward. Stay tuned for more!

In my last blog post, I underscored the “one library” theme — how special collections are integrating into the broader library and beyond. In this post, I’m going to take a look at new ways forward, which mostly (but not entirely) have to do with collections.

The first panel of the conference, “Managing Twenty-First Century Special Collections: Born Analog, Born Digital, and Born Difficult” focused on digital collections in a variety of contexts. Dave Thompson (Wellcome Library) talked about digital collections, both born digital and digitized, and invited the audience to view things from the point of view of the materials, as well as from the perspective of users. Do the format, metadata, rights statements support use and reuse? His presentation reminded us that librarians are custodians, not owners or even the main users of our works. Erin O’Meara (Gates Archive) works in a context where print and digital are integrated. Working with digital materials has caused her to rethink traditional archival practice, and step back in an attempt to understand collections that can’t be taken in visually (at least right now, in an time when we lack adequate tools for processing digital collections). For example, she employs ethnographic strategies, interviewing donors in order to understand the activities that led to creation of records.

The presentations also highlighted how difficult going forward can be, in a “one step forward, two steps back” fashion. Michelle Light (University of Nevada, Las Vegas) talked about early born digital efforts at UC Irvine, highlighting particular challenges and solutions in a “let’s make it up as we go along” fashion. For example, UCI opted to grant online access to the Richard Rorty papers by mimicking patterns in the real world (users who requested access first needed to acknowledge use policy before they could be ushered into a virtual reading room). Rights issues and tangled mess of creators, and in order to give online access, staff undertook item-level processing, which is distinctly non ideal and definitely not scalable, particularly in light of the daunting quantity of digital materials. In Greene-Meissner terms, we need tools and techniques that will take us from the tweezers to the shovel.

Details for the UCI Richard Rorty Papers, from Michelle Light's presentation

Details for the UCI Richard Rorty Papers, from Michelle Light’s presentation

Other presentations that illustrated how special collections are moving forward on the collections front were in the panel on “stakeholders.” Katherine Reagan’s presentation on the Cornell Hip Hop Collection is the prime example of striking out in new ways. What surprised me about this collection is that it was developed not to support academic programs at Cornell, but to document an important social movement. This noble goal was not without challenges — the library faced significant skepticism from the community, but also was challenged in coming up with appropriate staffing for the collection. The project took a novel approach of developing an community advisory board to help with necessary outreach and to establish important lines of communication. And the Cornell academic community stepped up, making use of the new collection.

The Cornell Hip Hop Collection is also a good example of how the nature of collections are changing: the collection website says that the collection “features: hundreds of party and event flyers ca. 1977-1985; thousands of early vinyl recordings, cassettes and CDs; film and video; record label press packets and publicity; black books, photographs, magazines, books, clothing, and more.” It’s a good reminder that as collections are less and less about books and journals, and more and more about material culture, our tried and true tools and techniques will need to change and adapt. Our systems are great for books, but what about (as Michael Stoller from NYU quipped during a follow up reactor panel) pizza boxes from the Occupy Wall Street Movement demonstrations.

You can view the videos for the Thompson, O’Meara, and Light presentations, as well as the video of the reactor panel and audience Q & A from the “born difficult” panel from the event webpage.

What do you think? Are there other was in which special collections, or other library units, are moving forward in new ways? Ways they should be? Hit us with a comment below! I’ll be following up later this week with another posting, on another theme that emerged from the meeting, collaboration.

Special Collections in the Collective Collection

Tuesday, February 12th, 2013 by Jennifer

Last month I facilitated a forum at the New-York Historical Society about Putting ‘Special’ in the ‘Collective Collection.’ We think it might be the first ever meeting about the centrality of distinctive and unique materials in discourse about the contemporary research ecosystem of shared print agreements, digital materials (both free and licensed), print collections, regional consortia, and resource sharing.

The meeting was standing room only, with a substantial waiting list. This group of thoughtful representatives from OCLC Research Library Partnership institutions set out to reconsider entrenched ideas about the irrelevance, or even the danger, of the collective collection to special collections.

What is the collective collection? In the recent mega-regions report, Constance and Brian defined the “collective collection” to be the combined holdings of a group of institutions, excluding duplicate holdings.

In our thought experiment, we mentally set aside the widespread overlapping collections, like those runs of STEM journals, subscriptions to Evans Online, or Google Books and the Hathi Trust. What’s left is a virtual collection of scarce publications – all in situ – that are held across the institutions in the group.

What remains is the rare stuff, “thy true heritage.” It is the widely-held material that allows us to focus on collecting (collectively) in the margins. The collective collection is not complete without special collections.

What does this strategy mean for researchers? It means that I can look every one of them in the eye and tell them that I can get them everything they need, regardless of where those materials “live”. And I can provide my rare books and special collections to all of my researchers, no matter where they do their work.

What are the implications for library administrators? The distinctiveness of your library’s materials – in concert with your colleagues’ special collections – is the hallmark of the collective collection.

Putting “Special” in the “Collective Collection” from OCLC Research

Share your ideas, in comments below, or in email to me.

Scan and Deliver… on Wikipedia!

Tuesday, July 24th, 2012 by Jennifer

I just learned from Max – our Wikipedian in Residence – that NARA (the US National Archives) is postings scans of archives on request and putting them up on Wikipedia. This pilot project is my new favorite creative experiment to maximize access to archives. The project page includes links to digitized images, with crowd-sourced transcriptions. Check out the example of a George Washington letter posted and transcribed. There’s a list of scans NARA has posted and the queue of requests.

What a creative experiment delivering digital images! I wish I known about it when Dennis and I were chatting on YouTube about scanning and photography in special collections.

Public libraries in the digital age

Thursday, July 19th, 2012 by Ricky

There is not often much in these posts about public libraries, but there are frequently posts about digital libraries. I admit to thinking there’s not all that much overlap between the two. Public libraries are ready to change that.

Last November a group of public library leaders met to begin to address the future of public libraries as information is increasingly digital. There was much discussion about the Digital Public Library of America (DPLA) and the role of public libraries in that endeavor — as well as the possible impact of DPLA on public library usage and funding. It was agreed that this was not a time to sit back and see what happens. If public libraries don’t serve the content the users want in the forms they want to consume it, their future is grim.

A new report, America’s Digital Future: Advancing a shared strategy for digital public libraries, summarizes the themes from the meeting and lays out an action plan for moving forward.

There can be no true Digital Public Library of America without the participation of public libraries. Public libraries are eager to digitize their unique materials and make them locally available as well as contribute them to DPLA. Perhaps a more burning issue is to ensure that public libraries can provide current commercial publications, including e-books, to their users. They cannot rely on the marketplace to represent public interests; this will require a national, concerted voice to negotiate with publishers and to minimize the digital divide.

This part of the public library action plan is being further pursued in an IMLS-funded project to develop an e-book strategy that will ensure that Americans continue to have access to commercially produced content through their local public libraries, even as formats change.

While OCLC’s constituency includes all libraries, the OCLC Research Library Partnership focuses on research libraries. These issues, though, are fundamental to all libraries and library users and I am pleased to have been involved in the public library meeting and report and in the forthcoming work on e-book lending.

Thick Description: Fingerprints, Sonnets, and Aboutness in Special Collections

Thursday, May 17th, 2012 by Jennifer

Discoverability of special collections has long been a top concern of the OCLC Research Library Partnership.  What works? Break out of the OPAC? Beyond MARC? End run around EAD?

Constance recently started a conversation here in the office about “catablogs.”  She’d seen that NYU’s Chela Weber taught a workshop in New York about how to use a blog as a low-overhead collection management system.  A “catablog” can create searchable, browseable online presentations of collections.

Today the Atlantic posted a short article about the impact of blogging rare books. At St Andrews, Daryl Green’s blog played an unusual role in what are otherwise standard special collections procedures – identifying new acquisitions and raising scholarly and financial support. (Book-nerd disclosure: I’ve been following Daryls’ blog for his 52 weeks of fantastic bindings, but Constance sent me the Atlantic article this morning.)

Ellen’s blogging about collections in ArchiveGrid is driving a healthy amount of traffic to ArchiveGrid itself. This is exactly the kind of research question we wanted to pursue with ArchiveGrid. Bruce has wondered if commentary and interpretation wouldn’t improve discovery and make it easier for a researcher to decide what to pursue.

This has prompted me to revisit The Metadata IS the Interface and user studies of relationships between description and discovery or use. Archivists and librarians contribute to discovery when they discard illusions of neutrality and express their excitement for the materials and their opinions about their significance. MARC and EAD have enhanced our management of collections, but don’t necessarily serve all the needs of our users these days.

Over on the RBMS-ish (rare books and manuscripts) side of our profession, considerable thought has been given recently to more rich description – “records more like sonnets,” as the Beinecke’s Ellen Elickson put it. I might borrow a term from the anthropologist Cliff Geertz and call it “thick description.” Michelle Light and Tom Hyry have advocated post-modern colophons and annotations. One of the RBMS hipsters has been arguing it is time to bust out of “the coldness of our description.” Mark Dimunation (Library of Congress) and others have imagined meaty and flexible descriptions of special collections like a wheel: hub and spoke. Merrilee blogged about Mark’s talk:

“Dimunation has been intrigued by James Asher’s call for progressive bibliography in which catalog records are viewed as hubs where information can be linked in, or hung on the core record as necessary. In this way, additional information can accrue over time, and doesn’t necessarily need to be contained in the catalog. Links to information that lives outside the catalog form a virtual vertical file that can document unique characteristics, and help form the fingerprint of an item.”

When I first joined OCLC Research, in the days of Shifting Gears, I thought that I’d wasted the past 10 years of my career building curated web exhibits of boutique collections of rare books, manuscripts and archives. In 2007 we needed to scale up digitization. Now my thinking is coming full circle. Curated blogs and exhibits, combined with the voice of the librarian/archivist, accomplish exactly what we’ve always wanted – to make collections visible and increase their impact.

Read the rest of this entry »

Digitizing special collections and leveraging fair use

Monday, March 5th, 2012 by Merrilee

After several years of work, the ARL Code of Best Practices for Fair Use in Academic and Research Libraries is out. I was particularly eager to read the Code for two reasons. First, I have long admired the work of Peter Jaszi and his colleagues at American University’s Center for Social Media who have been instrumental in producing several “code of best practices for fair use” documents for documentary filmmakers and other creative communities. Second, I strongly suspected that digitizing unpublished materials would turn up as one of the top challenges for academic and research libraries. And indeed, one of the eight scenarios addressed by the Code is creating digital collections of archival and special collections materials.

I was pleased to see that the Code and our own “Well intentioned practices for putting collections of unpublished materials online” (or WIP) are quite complimentary. Despite the fact that the Code was developed using what sound like Chatham House Rule and our discussions were conducted in the open, the two documents do not differ much in spirit. WIP downplays fair use in favor of managing risk (and outlines simple practical steps for doing so), whereas the Code makes a strong case for institutions to consider that collections are more than the sum of their parts, and these aggregations themselves may be transformative. It’s a powerful argument and also underscores the value of the work that librarians, archivists and curators everywhere do to build collections.

As a general observation, this is the first of the “codes of best practice” that has not only had a set of “limitations,” for each scenario but also “enhancements.” According to Peter Jaszi and Brandon Butler from ARL, the enhancements were added because librarians approach fair use with a good deal of caution. My fear is that librarians will read the “enhancements” as “requirements,” which would set us back in terms of making progress on what is perceived by some as a large digitization backlog. However, I do believe that this document should give additional courage to the community to digitize unpublished materials.

ARL and other organizations have been taking the Code on the road, and their have already been a number of webinars and in-person events so there are plenty of opportunities to learn more. As was said in one of the webinars, fair use is a muscle — if you don’t use it it will wither!

Copyright and risk: upping the ante?

Friday, November 4th, 2011 by Merrilee

The US Copyright office has recently issued a paper, Priorities and Special Projects of the United States Copyright Office. The paper outlines some areas that the office will be working in over the next two years. One of these projects has already resulted in a “white paper” on the mass digitization of copyrighted books. Although this piece of work is receiving a lot of attention, I’m more curious about another strand of planned work, a study on a “small claims solutions for copyright owners.” From the paper:

Copyright law affords a bundle of exclusive rights to authors, including the rights to
reproduce, distribute, publicly display, and publicly perform their creative works, or
license others to do so. However, these rights are meaningless if they cannot be enforced.
As the ease of infringement has risen, so too has the cost of federal litigation. At the request
of Congress, the Copyright Office is conducting a study regarding alternative means of
resolving copyright infringement claims when such claims are likely to involve limited
amounts of monetary relief.

[More information here]

I worry that creating a means to more easily sue for infringement will have a negative impact on institutions who are considering digitizing materials held in special collections, using a risk management approach (as described in the Well-Intentioned Practice document). Since the study won’t be completed until 2013, it’s too early to worry. However, it’s not too early to formulate a response! Those are due January 26, 2012, which is right around the corner.

Special delivery

Wednesday, May 18th, 2011 by Jennifer

Deliver a lot. Deliver a little. It’s all about delivery. We’ve been doing a lot of work around here on strategies to make it easy for users to get their ‘hands’ on special collections.

Most recently, Ricky published a snazzy piece on mechanics for large-scale digitization of non-book materials,  Rapid capture. These real-life examples dovetail nicely with her work (with Merrilee) about balancing rights and risks, rallying the community around reasonable practices when digitizing whole collections for access.

On the other hand, the Working Group on scanning and cameras has just published Scan and deliver in order to clear the air about user-initiated digitization. We give ourselves permission to just get the job done, by quickly scanning what someone needs and handing it to them promptly. If you have resources, you can choose when to scale up, maybe even going as far as digitizing the whole volume or collection, as long as it is in hand.

Whether we’re scanning an item requested by a user or digitizing an entire collection, it’s all about delivering up the collections we are privileged to manage.


See us at CNI next week!

Wednesday, March 30th, 2011 by Merrilee

A few of us will be attending the Coalition for Networked Information Membership Meeting in San Diego next Monday and Tuesday.

Jennifer will be giving a briefing on Managing Research Information for Researchers and Universities, which I think will nicely tie together some recent work and reports (ours and others). Ricky and I will be co-presenting on Out of the Eddies and into the Mainstream: Making Special Collections Less Special and More Accessible — like Jen, we’ll be summarizing a raft of our own work, and asking, “where do we go from here.” Unfortunately, both presentations are scheduled at the same time, from 10:30-11:30 Pacific on Tuesday April 5th. Jim Michalko will also be attending the meeting, so OCLC Research will be well represented.

If you’ll be in San Diego, I hope you’ll take some time to come to our presentations or otherwise track us down. If you aren’t able to attend in person, follow along on Twitter at #cni11s.

A crowdsourcing success story

Monday, March 21st, 2011 by Karen

I’m a great fan of the National Library of Australia’s Trove, a single search interface to 122 million resources—books, journals, photos, digitized newspapers, archives, maps, music, videos, Web sites—focused on Australia and Australians. You can search the OCR’d text of over 45 million newspaper articles that have been digitized.

OCR is not perfect. The original document is juxtaposed with the OCR transcription so errors are immediately apparent. Since the Australian Historic newspapers public launch in July 2008*, people have been correcting errors in the OCR’d text. Both the corrected text and the original text are indexed and searchable.

The enthusiasm of these public text correctors is amazing! The 15 March 2011 Trove newsletter notes:

Text correctors are still doing an outstanding job of improving the electronically translated text, and the number of corrections each month continues to increase. In January we had over 2 million lines of text corrected in a month for the first time, which continued through February. The running total of corrected lines has now reached 31 million!

One of the issues the RLG Partners Social Metadata Working Group addressed was to what degree moderation was needed when opening up the descriptions of cultural heritage resources to user contributions. The responses to the social metadata working group’s survey of site managers indicated that spam or “inappropriate behavior” was not a problem. Rose Holley (a member of the working group) provides additional corroboration that spam and derogatory comments were not a big problem after a careful review of comments.

…recently we made a decision to manually review the 18,000 comments that have been added to newspaper articles and other items in Trove. We found only 114 spam comments with URL that were removed and 71 comments placed by the same user in the same week that breached our terms and conditions (derogatory). These were also removed.

We thought that was very good news and supported our theory that moderation is still not required. We have however added a feature that enables a user to easily report spam via the trove forum.

This supports one of the working group’s recommendations: Go ahead! Invite user contributions without worrying about spam or abuse.

* For more details, see Holley, R. (2009) Many Hands Make Light Work: Public Collaborative OCR Text Correction in Australian Historic Newspapers, National Library of Australia, ISBN 9780642276940