The OCLC Research Library Partnership’s community exploration of research data management (RDM) concluded in December with the third and final installment of a three-part webinar series based on our Realities of Research Data Management project. The webinar covered the major findings of our recent report, Sourcing and Scaling University RDM Services, in which we examined two key aspects of acquiring institutional RDM services: deciding whether to source capacity internally or externally, and determining the scale of the user community for which the services will be deployed. In addition to concluding the webinar series, we also held a concluding round of discussions with the Partnership’s RDM Interest Group, anchored on the topics raised in the webinar. This post documents some of the highlights from the conversation.
First, a little more about the Interest Group. The group includes more than 80 individuals, representing nearly 50 Research Library Partnership member institutions in nine countries. Participants are distributed across a range of RDM roles, encompassing both strategic and practitioner perspectives. The Interest Group is an opportunity for participants to interact with OCLC Research staff and each other, sharing experiences about RDM services and practices, and pooling knowledge about the current state – and future evolution – of RDM. Interest Group discussions are catalyzed by the topics covered in the accompanying webinar series but are also free-ranging and flexible to accommodate participants’ interests.
Our concluding round of Interest Group discussions drew participants from North America, Europe, and the Asia-Pacific region. To catalyze the discussion, we briefly reviewed a finding from our Sourcing and Scaling report where we found that our four case study partners adopted four distinct sourcing strategies, illustrating that different universities in different contexts will shape different approaches to the sourcing decision. We also talked about scaling – which we define as choosing the size of the user community that an RDM service is intended to serve – and presented a simple framework that illustrates some of the key benchmark scales with which to view the potential user community of an RDM service.
The webinar also explored the important topic of interoperability, defining it on two levels: technical interoperability between data management systems, and social interoperability between different institutional and external stakeholders associated with those systems. Participants offered some interesting perspectives on this issue, including the fundamental and near-universal challenge of securing adequate metadata descriptions from researchers. This challenge is exacerbated by the fact that different repositories and research objects requires different levels of description. Good metadata is an essential element for effective interoperability across the various components of data management systems.
One participant described a situation where efforts to coordinate interoperability between the local Research Information Management (RIM) system and the data repository were hampered by broader university pressures to consolidate content and metadata in the RIM system. While the benefits of utilizing a single system to house a wide range of research outputs certainly helps mitigate the issue of technical interoperability, not least by reducing the number of systems with which researchers are required to interact, it can also lead to curation solutions that are less than ideal for research data management. For example, many universities (including our Realities of RDM case study partners) have found it beneficial, for a variety of reasons, to have dedicated data curation services and infrastructure that operate alongside general institutional repository services.
For social interoperability, many participants agreed that outreach to researchers, encouraging them to use RDM services, was the biggest pain point. One participant noted that a key challenge in fostering good relationships between RDM staff on the one hand, and researchers on the other, was the frequent turnover in both cohorts: as new staff and new faculty and students arrive, the process of cultivating good working relationships must begin anew. “Awareness of need” is an ongoing barrier to bringing RDM services and researchers together – for example, helping researchers understand that producing data management plans and making data sets open, understandable, and persistently accessible are important components of today’s research lifecycle.
Another strand of discussion centered around the benefits of externalizing data management services to outside providers. One participant noted that externalization allowed the university to experiment with, or test out, different solutions while minimizing the amount of scarce FTE resources required to support those services. Additionally, such external solutions allowed the university to “outsource” the technical skills needed to support key RDM services, while still providing a localized service for researchers. Another participant also emphasized the benefits of externalization and urged other universities not to do in-house development of RDM Curation services – the thinking being that it is far better to purchase technical development expertise from people whose business focus is providing such expertise.
But another participant pointed out that the decision to shift from internally sourced RDM services to externally sourced services cannot be taken lightly. At the participant’s university, significant legacy investments were made to develop in-house RDM services tailored to meet the specific needs of local researchers. In this case, any externally provided RDM service that the university would consider moving to would have to demonstrate a clear advantage in fulfilling the university’s data management requirements over and above the current performance of the internally developed services. This observation is a good reminder that for universities with existing internally developed RDM services, a shift to externalization is not frictionless; in many cases, a certain threshold of “lock-in” will need to be overcome in order to make the transition worthwhile.
A topic that arose out of our discussion of sourcing, scaling, and interoperability – especially in regard to the latter – was the need to clarify terminology in order to have more productive discussions about RDM with campus stakeholders. For example, one participant noted that conversations with IT services were sometimes hindered by a lack of a common understanding of the term “archive”. Other participants noted similar experiences, and offered additional examples of problematic terminology, such as “metadata”, “curation”, and “open access”. Participants agreed that social interoperability might be improved by developing a glossary of RDM-related terms that librarians tend to use differently than other campus stakeholders, such as IT staff, administrators, and researchers.
In addition to these strands of discussion, participants offered many other insights. For example, one participant noted that local storage of research data sets offers a number of advantages, including the ability to utilize multiple discovery and access services by simply sharing metadata and links, thus eliminating the cost and effort of moving the data from provider to provider. Another interesting observation had to do with the future of metadata collection: as one participant foresees it, metadata collection around data sets will become an almost constant activity, as transformations, uses, analysis techniques, and other information are recorded automatically as data is used, re-used, and refined over time. This may lead to challenges in coping with large quantities of metadata, and also ensuring linkages between different data set versions are current, although artificial intelligence and machine learning techniques may help in addressing the problem. Finally, a participant noted that successful data management training services often require that the information be presented in different ways for different audiences: for example, workshops might be appropriate for graduate students, while faculty might prefer information formats that are quicker to access and digest.
Recordings of the entire Realities of RDM webinar series are now available online, along with the accompanying learning guides. Summaries of previous RDM Interest Group discussions are available here and here.
With our third and final discussion, the RDM Interest Group has now ended. We thank all of our colleagues from the RLP membership who participated and shared questions, insights, and practical experiences. The Research Library Partnership is working with members to explore and prioritize additional webinars, discussions, and research efforts related to a broad range of institutional research support efforts. Details on new opportunities will be forthcoming soon to RLP members.
Brian Lavoie is a Research Scientist in OCLC Research. He has worked on projects in many areas, such as digital preservation, cooperative print management, and data-mining of bibliographic resources. He was a co-founder of the working group that developed the PREMIS Data Dictionary for preservation metadata, and served as co-chair of a US National Science Foundation blue-ribbon task force on economically sustainable digital preservation. Brian’s academic background is in economics; he has a Ph.D. in agricultural economics. Brian’s current research interests include stewardship of the evolving scholarly record, analysis of collective collections, and the system-wide organization of library resources.