Towards nationally integrated research information ecosystems
A lot is happening currently in the field of research information management (RIM) at national levels. In her recent reports and blog post series, Rebecca Bryant observed that RIM practices in US research universities have taken a characteristically decentralized course, compared to countries in Europe and the Pacific Rim, where national research assessment requirements drive a more centralized approach.
In fact, several countries – including Australia, Canada, Portugal, Finland, Norway, the UK and the Netherlands – pursue a national policy to integrate their research information.
RIM integration refers to the rationalization of the creation of metadata about research activity and outcomes and the seamless and secure access to it by RIM stakeholders. In other words, it is about making RIM data more interoperable to allow for easy aggregation.
It should help promote national scientific excellence and lead to greater national and international visibility. It also has many potential practical benefits, such as:
- a reduction of administrative burden for researchers;
- better analytics and business insight for national policymakers and research funders;
- efficiency gains with streamlined metadata flows between all stakeholders involved.
Persistent identifiers as the underpinning infrastructure
If RIM integration is to be achieved through the adoption of a framework of authoritative, interconnected, consistent, complete, reliable, timely and openly available metadata about public-funded research, then persistent identifiers (PIDs) must be considered key components of this framework. To better understand the thinking behind the centrality of PIDs in current RIM integration efforts, it is worth expanding on the notion of the “PID Graph”.
The PID Graph – a concept implemented by the EU-funded FREYA project with PID data from DataCite, Crossref, ORCID, and the Registry of Research Data Repositories (re3data) – is an underpinning infrastructure that establishes connections between the entities that matter within the research landscape: researchers, universities, outputs, funders, grants, and more. As the authors of the article “Connected Research: The Potential of the PID Graph” explain, this approach takes PIDs (for example ORCID iDs and DOIs) as the basic nodes that are linked together, instead of the entities they represent (for example researchers, articles). It constructs a graph based on the relations available in the PID-metadata.
Because it relies on PIDs only, the PID-Graph requires that all relevant entities have their own distinct PIDs and having multiple PIDs refer to the same entity is undesirable. Furthermore, these PIDs should carry authoritative and sufficiently rich metadata to represent the relationships of interest.
It also “requires standard ways for exposing and discovering these connections as well as infrastructure that makes it possible to contribute and/or consume connections”.
In contrast, knowledge graphs – such as Google’s KG or Wikidata – typically start with domain entities and construct a graph of the relations that are assembled from numerous sources (including but not limited to PID-metadata) or built by domain communities. This approach requires extensive knowledge extraction, codification and validation efforts. Growing and improving a knowledge graph over time requires effective methods for disambiguation, enrichment, quality assessment, and refinement.
Current national RIM integration efforts are geared towards optimizing the use and value of PIDs within their research ecosystems and lean towards the more pre-coordinated PID Graph approach. It requires substantial initial investment in PID metadata and PID-centric workflows. But, the potential for significant cost-savings, put forward by the UK PID Consortium Cost-Benefit Analysis report in June 2021, provides a compelling argument for doing so.
Engaging in national concerted action
The adoption of persistent identifiers is thus actively promoted and coordinated in several countries as part of their national research information policies. In most cases, stakeholder groups have been established – such as the PID Forum Finland, the UK national PID Consortium, and the Dutch PID-working group – to develop a national PID strategy and roadmap.
In some cases, a PID board or committee, with representatives from across the national higher education and research community (policymakers, funders, infrastructure providers, research libraries, research data repositories, etc.) steer this work. The recently installed Research Identifier National Coordinating Committee (RINCC) is an example: it ensures alignment between policy and practice across the research community nationally, and liaises with partners and stakeholders internationally. These national PID coordination efforts are usually linked to broader, national open science programs and open research infrastructure initiatives.
We also see global PID stakeholders, such as ORCID and DataCite, form national consortia to promote and engage in concerted action on national PID use. The Research Data Alliance (RDA) likewise has its national nodes involved in national PID strategies, promoting alignment – as they did during the Birds of a Feather session in April of last year.
The national consortia involved in the RIM integration efforts work hard to promote the systematic adoption and support of “priority PIDs” (DOIs, ORCID iDs, ROR IDs, RAiDs ) in the many different systems across campus (identity and access management, HR, Finance, RIM, institutional repository, library systems, research portal) and the wider scholarly communication supply chain: both upstream (national funders and publishers) and downstream (metadata aggregators and harvesting sources, indexing and discovery systems).
Towards a holistic and connected approach
At OCLC Research we follow developments in both fields of RIM and PIDs and the recent open consultation regarding the development of a national PID roadmap by the Dutch PID working group triggered some thoughts and questions, which I am sharing here.
Beyond STEM
The challenges with, and gaps in PID use for monographs and book chapters in the humanities are often underexposed in national PID roadmaps. As several stakeholders have made clear – among others OAPEN and the Crossref Books Group – PID adoption in scholarly book workflows has lagged behind the practices in STEM journal publishing. This, combined with the often complex data exchange flows between incompatible publishing systems, RIM systems, and metadata aggregators and indexers, means that humanities and social science content is disproportionally impacted.
The December 2021 guest post “Scholarly Book Publishing Workflows and Implications for RIM Systems” in the Scholarly Kitchen blog, co-authored by my colleague Rebecca Bryant, is a call to action offering recommendations for different key players in the scholarly book supply chain. The recommendation most relevant in the context of a national PID roadmap is addressed to research libraries and their ability to promote the use and integration of PIDs across campus.
Beyond RIM
How will the RIM metadata layer and the discovery metadata layer interconnect, using PIDs?
OCLC’s Shared Entity Management Infrastructure Project (SEMI) begins to build the needed infrastructure with reliable and persistent identifiers and metadata for the critical entities that libraries manage. SEMI’s long-term guiding principles are to support unique identifiers of all types and to allow for a heterogeneity of standards, practices, and metadata structures.
Interoperability of national RIM infrastructures with global library infrastructures that store same or similar PIDs is a topic for further discussion.
The role of PIDs as interoperability keys – pointing to other PIDs which identify the same entity but, in another context – is crucial when different domains and their infrastructures need to interconnect.
Beyond the academic sector
The ecosystem of PIDs is still developing and will keep evolving, and no single PID will ever be the only one or the perfect one. It will be important to support different PIDs with overlapping functions and coverage, keep systems and roadmaps adaptable and extendable, and link to and between different PIDs. ORCID iDs and ISNIs are a case in point. ORCID iDs are designed for active researchers; ISNIs, for authors and creators, alive or dead, fictive, or real. Fostering the links between the two PID systems transcends current research information interests and addresses the long-term interests of the scholarly record. The same approach is recommendable for other areas of PID usage.
Academic institutions will be using PIDs in multiple domains, not only for RIM functions.
Cooperation between academic and cultural institutions in connecting cultural heritage collections plays an increasingly important role and requires the use of a combination of sometimes domain specific PIDs. Alignment between both academic and heritage sectors with regards to PIDs is already in the works. In the UK, the Heritage PIDs Project looked at PIDs to create a unified national collection of the UK’s museums, libraries, galleries, and archives, in the context of digital humanities and arts research. Similarly, in the Netherlands, academic libraries are seeking to interconnect their special collections, according to linked open data principles and by use of persistent identifiers – as promoted by the Dutch Digital Heritage Network.
The global library perspective
At OCLC, we see the burgeoning research information ecosystem through a library lens. Libraries are central hubs in the global information network. Through their concerted efforts, librarians help communities around the world discover and access academic knowledge, and they help researchers discover non-academic resources relevant to their enquiries. They make this happen by integrating their collections in larger, global aggregations – such as WorldCat – for a more comprehensive discovery experience.
And as librarians engage with WorldCat entities in the new linked data environment, discovery will happen because of the interoperability power of PIDs and their ubiquity in the library domain.
Closing remarks
As research information ecosystems are built, it is imperative to seek partnerships with all stakeholders serving the same community, so that they work together productively. The authors of national PID recommendations are seeking input to the many PID roadmap initiatives from a range of contributors. At OCLC Research, we would happily respond to such a call.
Thanks to the many OCLC colleagues who have contributed to this post.
Titia van der Werf is a Senior Program Officer in OCLC Research based in OCLC’s Leiden office. Titia coordinates and extends OCLC Research work throughout Europe and has special responsibilities for interactions with OCLC Research Library Partners in Europe. She represents OCLC in European and international library and cultural heritage venues.