Mini-symposium on RDM in Leiden

Last month, OCLC Research organized a mini-symposium on Research Data Management (RDM) at OCLC’s European Headquarters in Leiden. Around twenty RDM experts from SURFsara, DANS, TU-Delft, Leiden University, Wageningen University & Research attended, together with OCLC colleagues, guests from NUKAT (Union Catalogue of Polish Research Library Collections) on an ERASMUS training programme, and invited speakers Hilde van Zeeland and David Minor. It was a nice group of professionals, eager to exchange knowledge and reflect on current RDM-challenges.

Rachel Frick kicked off by giving an OCLC Research update on work done in RDM and relevant, adjacent areas, such as RIM (Research Information Management), PIDs (Persistent Identifiers), Linked Data and Web Archiving (metadata practices to support scholars building archives for research purposes). She invited attendees to get involved and help guide and fuel this ongoing work by joining the OCLC Research Library Partnership.

Rebecca Bryant took a deep-dive into the “Realities of Research Data Management” – an investigation based on four case studies and interviews at research universities in very different national contexts (Australia, Netherlands, UK, USA). The study gives insight in major decision points universities face in acquiring RDM capacity: deciding to act (incentives to build/acquire RDM capacity); deciding what to do (scoping out a bundle of RDM services); and deciding how to do it (for example, deciding which services will be built locally, or externalized to an outside provider). It provides a framework for thinking in terms of RDM services as a bundle, with three main categories: Education, Expertise and Curation. The findings suggest that there is no one-size-fits-all RDM service bundle for all universities to duplicate; instead, RDM-services tend to grow as a customized solution, shaped by a range of internal and external factors operating on local decision-making. They will be sustainable and valued to the degree they can respond to evolving incentives, such as compliance with external mandates, adaptation to scholarly norms and practices, execution of institutional strategic planning objectives.

In her presentation, co-authored by Jacquelijn Ringersma, Hilde van Zeeland gave a concrete example of how Wageningen University & Research is responding to these incentives by devising a new data policy. The policy for good data management is taking account of the Netherlands’ National Archives Act, the Dutch code of conduct for scientific practice and the FAIR principles. It also takes into consideration the diversity of data research practices across disciplines, in a continuous effort of re-assessing uses cases through interviews with faculty to compare guidelines with actual practices. As cross-campus initiatives on data-related policies – RDM, IT-security, GDPR – were taking place simultaneously, the library, the IT- and Legal Services aligned their efforts to ensure consistent messaging around the common theme of data. Hilde’s presentation triggered the interest of the other practitioners around the table and a lively Q&A session followed. It was interesting to hear that many policies at Dutch universities are devised at the Faculty-level and derived from the more generic policy at the University-level. The National Coordination Point Research Data Management points to all the Dutch RDM-policies to facilitate sharing and learning from each other.

David Minor gave the last presentation, sharing his reflections on a decade of RDM-work at University of California San Diego. Much effort was spent on building services based on the local repository and a single metadata model, to be used for all digital objects. David offered some interesting food for thought, with examples of the challenges they are trying to address: the metadata intensive and slow ingest process and the uncertainty about discoverability and re-use requirements (what is “good enough” metadata?); the ability to interoperate with other applications and processes on campus and with other emerging networks in Africa and South America (e.g., COAR); how to make datasets available via platforms like GitHub, “because researchers are not looking at repositories”; and the big demand for data carpentry training classes and the new phenomenon of “Data Science Institutes” that train the next generation of data scientists. David invited us to take a moment for reflection and opened the floor for discussion.

A lively conversation followed. One participant thought the presentations gave a very institutional-centric perspective; she said researchers often work in international groups and they use Google Docs, DropBox and similar cloud-based Freemium services to support their collaborative work. Sharing data across institutions and countries is a problem, in particular with sensitive data. What can universities and Research Data Centres offer? In the Netherlands, SURFconext facilitates access to cloud services from the international education and research sector. There was a concern about researchers continuously using new and different tools, leading to a work space that is in a state of constant flux and difficult to manage. This led to the observation that active data management is actually a bigger challenge than data archiving: the demands are more complex and diverse, and the need for shared identifiers, file naming conventions and standardized vocabularies, etc. more imperative in collaborative environments.

The discussion turned to messy metadata and David’s doubts that “machine-learning will fix it”. Data science experts in the room agreed that with scientific datasets, one needs to know what they represent and thus, it is not good enough to be sloppy now and hope we can figure it out later. There are semi-automatic ways to clean messy metadata or create relevant metadata, from existing text-based metadata and, as one participant put it: “publications are the best metadata for datasets”. Linking publications to datasets is mostly considered a workflow problem. There are still many hands-on improvements that can be done. A 2016 study on research data reuse at Dutch universities, by Tessa Pronk (University of Utrecht), was mentioned. It demonstrates that reuse cannot easily be quantified, due to the inconsistency and diversity of practices and recommends improvements at all levels of the data-cycle (data producers, data repositories, journal publishers, funding agencies). The upcoming event at the TU-Delft Library on May 24th was announced: Towards cultural change in data management – data stewardship in practice.

With this symposium we, at OCLC, are happy we have contributed an interesting thread to the continuing RDM-conversation in the Netherlands and beyond.

Please visit the event page with the presentation slides and the link to the recording: