Persistent identifiers for local collections

That was the topic discussed recently by OCLC Research Library Partners metadata managers, initiated by Jackie Shieh of George Washington University, Naun Chew of Cornell and Dawn Hale of Johns Hopkins University. Information professionals want to repurpose, present and connect the data they have created and curated from century-old standards and practices by publishing library metadata in the linked data framework. Recent linked data efforts have highlighted the importance of identifiers— a unique alphanumeric string associated with a digital object and resolvable globally over networks via specific protocols that is unambiguous to use, find, and identify the resource. Local identifiers cannot be shared or re-used. We need identifiers to be unchanging over time, and independent of where the digital object is or will be stored, that is, “persistent”. Persistent identifiers help collections become accessible globally, as they can be used, shared and re-used.

The practice for assigning identifiers has been inconsistent. Focus group members noted that maintaining identifiers and losing semantics when mapping one identifier system to another as particular challenges.

Identifiers for “works” has been problematic, as there is no consensus on what represents a distinct work. Two different workflows were mentioned: 1) find an OCLC work ID and add it to the local record and 2) use local algorithms to cluster records in the local catalog, assign a local identifier, and then match that ID with external sources such as the OCLC work ID.

The discussions were wide-ranging, but tended to focus on identifiers for personal names over other types of entities. The desire to present a comprehensive compilation of scholarly output on faculty profile pages has prompted a number of research libraries to roll out ORCIDs (Open Researcher and Contributor ID) for their faculty. ORCID is seen as a way to address the big gap that currently exists in the LC/NACO and other national authority files that do not customarily include authors of journal articles and other scholarly output. Authority files are used only within the library domain. Funding agencies have begun to require ORCIDs as part of the submission process. Few felt that current authority workflows would scale to cover all an institution’s researchers; some journal articles may have several hundred different “authors” listed from multiple countries. Some researchers are reluctant to use any identifier they are not already using. Faculty can be sensitive about keeping their data private and the potential of “surveillance” or “Big Brotherism” by their institution. Automated ways of comparing faculty output can be seen as threatening.

Some outstanding issues with name identifiers:

Some researchers already have a half-dozen or more ORCIDs as well as other identifiers.
Skeletal entries make it difficult to determine whether they represent the same or different people.
ORCID relies on self-registration, so the deceased are not covered. To be comprehensive, more than one identifier system is needed.
There’s an emerging need for a name reconciliation service that can link multiple identifiers representing the same person.
For identifiers registered through VIVO, it’s unclear what happens when the person moves to a new institution, retires or dies.
Libraries’ data suppliers and system vendors need to support persistent identifiers.

Identifiers for organizations are even more complex than those for persons, as organizations can merge, split, acquire other organizations, have multiple hierarchies, change locations, etc. The Representing Organizations in ISNI Task Group is documenting these issues and recommending some ways to better represent organizations with International Standard Name Identifiers (ISO 27729). These identifiers are important to accurately reflect researchers’ affiliations so that institutions can compile and report their scholarly output easily. Digital Science’s newly released GRID (Global Research Identifier Database) includes ISNI identifiers and maps institutions through GeoNames. GRID is seen as a way to help facilitate linking and promoting the work of the organization.

Identifiers for data sets such as digital resources and collections in institutional repositories include system-generated IDs, locally-minted identifiers, PURL handles, DOIs (Digital Object Identifiers), URIs, URNs and ARKs (Archival Resource Keys). Some are using DataCite to mint and publish DOIs. Resources can have both multiple copies and versions and change over time. Institutional repositories used as collaborative spaces can lead to multiple publications from the same data sets. Libraries want to be able to link related pieces such as preprints, supplementary data and images with the publication. Multiple DOIs pointing to the same object pose a problem. Some libraries are considering using the EZID created by the California Digital Library to mint and publish unique, long-term identifiers and thus minimize the potential for broken citation links. Ideally, libraries would contribute to a hub for the metadata describing their researchers’ data sets regardless of where the data sets are stored.

Karen Smith-Yoshimura

Karen Smith-Yoshimura, senior program officer, topics related to creating and managing metadata with a focus on large research libraries and multilingual requirements. Karen retired from OCLC November 2020.