Digitization challenges - a discussion in progress

Internet Archive book scanner | Wikimedia Commons

It has been some time since we hosted our Digitization Matters symposium, which led to our report, Shifting Gears. This event and findings from the surveys of archives and special collections in the US and Canada, and the UK and Ireland have helped to shape our work in the OCLC Research Library Partnership for some time. However, we felt like enough time had gone by, and enough had changed that it was time for us to begin some new discussions in order to frame future work.

We often hear from library colleagues that they continue to experience challenges associated with digitization of collections, so earlier this month we hosted some discussions (via WebEx) to try to get a handle on what some of those challenges are. Prior to the conversations, we asked participants to characterize their digitization challenges, and then did some rough analysis on the responses. Challenges fell into a number of areas.

Rights issues (copyright, privacy)
Born Digital, web harvesting
Issues with digital asset management systems (DAMS) or institutional repositories (IR)
Storage and preservation
Metadata: Item-level description vs collection descriptions
Process management / workflow / shift from projects to programs
Selection – prioritizing users over curators and funders
Audio/Visual materials
Access: are we putting things where scholars can find them?

We opted not to include the first four issues in our initial discussion — copyright, and rights issues in general, are quite complicated (and with a group that includes people from Canada, Europe, Asia, Australia, and New Zealand I’m not sure we could address it well). We have done quite a bit of work on born digital (and are currently investigating some areas related to web harvesting). At least for our first foray, discussions on DAMS and IRs seemed like they could have gone down a very tool-specific path. Likewise with storage and preservation. Even taking these juicy topics off the table, we still found we had plenty to chew on.

Metadata: Item-level description vs collection descriptions

Many of our discussion participants are digitizing archival collections — there is an inherent challenge in digitizing collections at the item or page level when the bulk of the description is at a collection level. People described “resistance” to costly item level description, and a desire to find an “adequate” aggregate description. On the other hand, there was an acknowledgement of the tension between keeping costs down and satisfying users who may have different expectations. A key here may be a more nuanced view of context — for correspondence, an archival approach may be fine. In other circumstances, not. Some institutions are digitizing collections (such as papyri) where the ability to describe the items is not resident in the library. How can we engage scholars to help us with this part of our work?

Process management / workflow / shift from projects to programs

Many institutions are still very much in project mode, looking to transition to programs. For those who have or are working towards digitization programs, there is a struggle to get stakeholders all on the same page: at some institutions, the content owners, metadata production unit, and technical teams seldom if ever come together; here, getting all parties together to establish shared expectations is essential. Some institutions are looking to establish workflows that will more effectively allow them to leverage patron-driven requests, while others are thinking about the implications of contributing content to aggregators like DPLA. One institution has started scanning with student employees — when students have a few minutes here or there, they can sit down at a scanning station and scan for 10-15 minutes — this leads to a steady stream of content.

Selection – prioritizing users over curators and funders

Many institutions are still operating under a model whereby curators or subject librarians feed the selection pool, either through a formal or informal process. Even in these models, it can be difficult to get input from all — there tend to be a small pool of people who engage in the process. At one institution, people who come with a digitization request are also asked to serve as “champions” and are expected to bring something to the project — contributing student hours to enhance metadata, for example. One institutions views selection as coming through three streams — donor initiated, vendor or commercial partner initiated, and initiated by the curatorial group (emphasizing that the three are not mutually exclusive). Another institution is looking at analytics and finding that curator initiated requests generate less online traffic than patron initiated requests. In a similar vein, a third institution is looking at what is being used in the reading room and considering making digitization requests based on that information. Even though people’s survey responses indicated that they would like to move selection more towards directly serving researchers needs, from the discussion I’d observe that few institutions have established models to do so.

Audio/Visual materials

As with born digital, everyone has A/V materials in their collection, and making them more accessible is a concern. A participant from one institution observed that they see key differences in interest for these formats — for example, filmmakers, not scholars, are the people who will seek out video. If there is a transcript for materials, that may impact demand. A/V projects tend to focus on at-risk materials, since costs are so high. Some institutions are beefing up their reformatting capacities, in anticipation of needing to act on these materials. If you are interested in this area, you will want to track the activities of the (US based) Federal Agencies Audio-Visual Working Group.

Access: are we putting things where scholars can find them

For many institutions, aggregation is the name of the game, and thinking as a community about aggregating content is key: “Standalone silos don’t help users find our things.” Whether materials are in discovery repositories that are hosted by the institution or elsewhere, discoverability and user experience are concerns. One institution assigns students to search for materials via Google and in repositories. Are collections findable?

Thanks to all who took part in our discussions! I hope we’ll have more to report in the future.

Merrilee Proffitt

Merrilee Proffitt is Senior Manager for the OCLC RLP. She provides community development skills and expert support to institutions within the OCLC Research Library Partnership.

One Comment on “Digitization challenges – a discussion in progress”

Larry Creider says:

March 24, 2015 at 1:46 pm

Very helpful! Two additional points:
Collection vs. item metadata reminds me of the resistance to cataloging the titles in microform sets. Usage increased tremendously when such analytics were provided. Otherwise, large sets gathered dust.
Project to program: we have digitized several campus publications and archival resources using projects but have not been able to make the gathering of subsequent issues and publications routine. Archive-It is very hard to make granular enough while still catching what you need.

Comments are closed.