Why do we need persistent identifiers for cultural heritage? This post is part of a series reporting back from conversations on transitioning to the next generation of metadata in Spain.
I was thrilled to have the opportunity to speak at this event, and to share insights from my perspective as Adoption Manager at DataCite & ROR. I was particularly keen to discuss use of persistent identifiers (PIDs) for cultural heritage objects and collections, as this has become a topic of much interest among the DataCite community recently.
What is the value of persistent identifiers for cultural heritage objects?
Persistent identifiers are a special kind of URL (uniform resource locator) that allows users to always reliably find a resource at that URL, regardless of whether its actual location on the Web changes.
The use of PIDs is important in an academic context, as they allow researchers to reference an online resource in their publications and be assured that the reference will resolve over time to the same resource. Citing resources using PIDs, in turn, allows seamless, automated tracking of citations and reuse. PIDs can also aid discovery of objects and their metadata, as PID service providers offer APIs to aggregators, indexers (and, in the case of DataCite, anyone at all) that serve as “one stop shops” of metadata for millions of objects.
Which PIDs are used for cultural heritage objects?
PIDs have gained much traction in cultural heritage over the past few years, with many different types of identifiers now in use. The recently-completed Heritage PIDs project provides an excellent and comprehensive list of PID functionalities: Which persistent identifier does what? Many of these identifiers are specific to a particular subject area or object type; however, several are general purpose and commonly-used for many object types in cultural heritage: handles, archival resource keys (ARKs) and digital object identifiers (DOIs).
• Handle: The Handle System allows institutions to run their own PID service, which means identifiers are registered and resolved locally. This is very cost effective, as there is no direct per-identifier cost, however, running a local Handle implementation entails long-term maintenance responsibilities. While some digital collection software incorporates Handle registration/resolution, this dependency means that if you decide to migrate the software, you may also need to migrate your Handle implementation. Additionally, the Handle system does not support registering metadata, so it does not provide discovery benefits of aggregated metadata.
Handle example: the Rijksmuseum
• ARK (Archival Resource Key): ARK is a Handle-based system developed by California Digital Library Identifier Services. Like Handles, ARK is decentralized, with many institutions running their own instance of an ARK service. ARKs can, however, be resolved globally by the Name-to-Thing (N2T) service. ARK supports metadata registration, which means that the metadata can be exposed through APIs independently of the object itself. ARK, however, does not dictate a standard metadata schema, which can make it somewhat difficult for aggregators and indexers to consume the metadata.
ARK example: Le Louvre
• DOI (Digital Object Identifier): The DOI system is a specific global implementation of the Handle system, governed and managed by the International DOI Foundation (IDF). DOI registration and resolution services are provided to institutions by registration agencies, which serve different geographic areas and/or resource types. Aside from centralized registration and resolution, DOIs require metadata registration and each DOI registration agency has a standard metadata schema. This allows indexers and aggregators to reliably consume DOI metadata in a machine-readable way.
DOI example: University College Dublin
How to choose a PID?
The “right” PID for your collections/items depends on your specific use case. Some considerations include:
- Costs: Running your own instance of a Handle or ARK system entails long-term indirect costs to maintain local infrastructure and staffing, but no direct per-identifier direct cost. Registering new DOIs incurs direct, per-identifier monetary costs, but limited indirect costs.
- Long-term persistence/sustainability: Governance ensures the sustainability of the social and technological foundations of a PID system. In the case of DOIs, there are multiple layers of governance: global DOI infrastructure and management is governed at the DOI Foundation level and each registration agency maintains its own governance model. DataCite, for example, is a membership-based non-profit governed by its members and executive board, with guidance from community steering and working groups. Regardless of PID type, persistence requires responsible maintenance of the actual location address of a resource in a PID-registry – which remains the institution’s own responsibility.
- Community support: Handles, ARKs, and DOIs all have strong user communities globally, and all offer support services to some degree. DOI registration agencies typically offer more comprehensive support and community engagement resources, as well as value-added tools and services tailored to their specific communities. For example, DataCite actively participates in community activities and groups related to developing standards and best practices around non-traditional scholarly outputs and resources. Registration agencies also invest in tools and services that leverage DOIs and their metadata, such as DataCite Commons.
Additional questions to consider include:
- How persistent and how global do you want your identifiers for your items or collections to be?
- How many items do you want to assign persistent identifiers to? Some cultural heritage collections have huge numbers of images for example, in various versions, sizes, and formats. Which versions need a PID and which ones don’t? There are some organizations that register local handles for all versions of an object, and choose to register a DOI for global access to the main version of that same object.
- At which level do you need to register a persistent identifier? at the item level? collection level? or some kind of higher level unit of a collection?
- Which registration mechanism works best for your workflow and your organization?
- What cost-model is more appropriate for your use case: direct payment to a DOI agency versus the staff and technical resources needed to run and maintain a handle- or an ARK-system, or another type of identifier?
The UKRI Heritage PIDs project produced an excellent PID Use Case Mapping guide as well as many other helpful resources. I highly recommend consulting the UKRI Heritage PIDs resources page for additional guidance!
Take-aways from the discussion: PIDS vs URIs vs URLs
In the discussion that followed this presentation during the Spanish round table discussion on next generation metadata in cultural heritage, participants noted that there is a tendency to use terms like “PID”, “URI”, and “URL” interchangeably, when they are not equivalent. While PIDs are often expressed as URLs/URIs, URIs and URLs alone are not necessarily PIDs. This triggered an important observation about the need to be more specific, accurate and intentional when talking about identifiers. As one participant put it:
“What we would have to do is make that distinction a bit. By this I don’t mean that one is better or worse than the other, but they are different. We may be talking about a persistent identifier and not talking about linked open data at all, and we may be talking about persistent identifiers that still have a lot of work to be done behind the scenes – in terms of providing structured information or access to the data. Also there are systems that provide access through URLs to data that are better in quality or quantity than those provided by some linked open data applications. I just wanted to make this distinction, so that we have it clear throughout the development of these conversations.”
You can download the slides from Liz Krznarich’s presentation here: https://doi.org/10.5281/zenodo.5899151
Titia van der Werf is a Senior Program Officer in OCLC Research based in OCLC’s Leiden office. Titia coordinates and extends OCLC Research work throughout Europe and has special responsibilities for interactions with OCLC Research Library Partners in Europe. She represents OCLC in European and international library and cultural heritage venues.