Collections as Data: Nascent progress and common need

Image of an abacus, typewriter, and laptop computer representing the evolution of machines and computation.
Machine evolution by H Alberto Gongora on Noun Project

Next week I’m headed to Vancouver to attend Collections as Data: State of the Field and Future Directions, a collaborative working summit designed to provide an opportunity to assess the state of collections as data at an international level. I’m looking forward to joining in conversation with an exciting, international roster of colleagues. As a senior program officer for the Research Library Partnership (RLP), I am in the unique and fortunate position to be able learn about collections as data efforts across the international range of research libraries in the RLP. To inform my participation in the convening, we convened two discussions sessions on the topic of collections as data with RLP partners.  

An open invitation was extended to all affiliates at RLP institutions; 70 individuals participated from 39 separate RLP institutions in 6 countries (Australia, Canada, Hong Kong, Netherlands, UK, & US), from academic, museum, federal, and independent research libraries and archives. The conversation included people with technical services, public services, research support, digital projects and scholarship, and special collections responsibilities, and ranged from administrative and managerial roles to individual contributors. This broad interest and participation reflect the diversity of roles and skills necessary to facilitate collections as data work. 

The discussions were structured around the following questions:   

  • How has your institution supported collections as data efforts?   
  • Where is your institution on the project to program spectrum with regard to collections as data?  Is it a priority and what kind of resources do you now have for this effort? Do you expect that prioritization to change?  
  • What are you hearing from researchers about collections as data needs and approaches? Are their needs or expectations changing or evolving? 

The resulting conversations were lively and varied, with insightful contributions from across the RLP cohort. Even with many different institutions represented, and various levels of experience with and investment in collections as data, clear areas of commonality surfaced. The rest of this post summarizes key points synthesized across the two discussions.  

Collections as data is still nascent everywhere 

Experience with and support for collections as data varied across the group, but none of the participants described their institutions as having a full-fledged program. Most participants talked about being in an early stage with this work, had limited staffing resources to devote to it, and were dealing with requests for data sets on an ad hoc basis or as a part of discrete, one-off projects. While participants described successful responses to requests, they also discussed the challenge of a reactive stance to serving this research need. One attendee explained, “[We are] trying to make interventions at point of need when there’s a researcher who needs access to a data set.  And there’s a long time-lag between that initial request and the ability of the library to obtain the data. Oftentimes it’s too late, it just didn’t meet the need.”  

Needs assessment is challenging

People also described working to advocate for collections as data, approaching it on a policy and planning level as well as an operational one. A key challenge here was making confident assessments of just where collections as data should be their priorities. Many participants did not see consistent or predictable demand for collections as data and were unsure if this was because it isn’t a widespread need or because researchers aren’t aware the library offers services in support of computational access. This uncertainty impacts advocacy, making it difficult to know what resourcing is warranted. As one participant put it, “The real challenge is figuring out, okay, if we’re going to roll out infrastructure for this, is it worth it?” 

A disconnect with researchers

Working with researchers was central to concerns and challenges raised in the discussions. Many participants felt a disconnect from researchers in this space, and described a need for greater understanding of researcher needs. They are unsure how to provide data to researchers in ways that will be useful to them and what kind of tooling (if any) might be most beneficial to provide. They also described a disconnect between the skills researchers had and those they need to work with library data sets. In many cases, researchers are interested in large data sets, but do not have the subject knowledge to understand what questions to ask of the data, or conversely, might have the subject knowledge but lack the technical skills to work computationally. There was a sense that in addition to providing data sets and tooling, outreach to make datasets visible and understood was also needed.  

Navigating rights and restrictions

Issues around navigating rights and restrictions also surfaced as a challenge, one that impacts access to collections and ability to scale up work around collections as data. With regard to vendor supplied collections, there was frustration with digital rights management (DRM) or other mechanisms put into place that limit or slow programmatic access to materials, as well as a need for access via APIs. Participants discussed wanting to make mechanisms for computational access a commonly understood expectation for vendor-supplied systems and collections. One participant reflected this desire by asking “How do we make this an expectation within the systems that we have? So that this isn’t a unique ask that [an individual institution] needs, it’s something that’s just expected of what our systems do for us.” For collections that originate in archives and special collections, participants voiced concern about being able to provide computational access to digital collections in a way that minimizes copyright, privacy, and confidentiality concerns, and allows the archive to keep any contractual obligations to the collection donor or creator.   

The challenge of scaling

In general, scaling up work around collections as data was described as a challenge throughout the discussions. Some institutions are leveraging projects to build a program, learning from experimentation, and folding it back into creating workflows and frameworks for the future. Pilot projects that involved both library staff and researchers were seen as particularly valuable. More mature programs are building tasks that will support collections as data into routine workflows in cataloging, digitization, and data management. One challenge to this is the many different people and departments that may need to be involved in collections as data work. In many cases, this work was like the parable of the elephant in the dark room, with each person only seeing and understanding the portion they are close to, making it challenging to collaborate, communicate, and understand holistic need. As in other areas of work in research support that bring together many constituencies, there is a need to pay attention to social interoperability across the many parts of the library that make collections as data support possible.

Participants in our conversations aspire to do more with collections as data, but still have much to figure out to move this work forward. They described a need for shared knowledge across the LAM sector that could help support collections as data work, to make the lift lighter at individual institutions. Examples included best practices for an MVP (minimum viable product) for data packages, pedagogical frameworks to help people understand data sets and what they can do with them, or functional requirement examples to help advocate for needs with systems and collection vendors.  

These conversations made it clear that there are common challenges in making library collections available for computational research, which points to an opportunity for collective work to address shared needs. I look forward to bringing the insights outlined here to conversations at the Collections as Data summit, and thinking with colleagues about how to collectively move this work forward.