That was the topic discussed recently by OCLC Research Library Partners metadata managers, initiated by Chew Chiat Naun of Cornell University. Using identifiers now to point to “things” rather than relying on text strings will facilitate transforming legacy data into linked data later. By linking to authoritative sources through identifiers, libraries can reduce the need for local maintenance of authority files. A number of institutions have already started adding identifiers to their catalog records, including the national libraries of France and Germany.
A Program for Cooperative Cataloging task group is developing a plan to incorporate identifiers (URIs) in MARC bibliographic and authority records as mainstream practice. One challenge is to differentiate real-world objects from descriptions about real-world objects. This distinction may be difficult for catalogers to make, but maybe tools could be created to make this differentiation easier. The goal is to align library practices with those of the semantic web. The task group has focused on MARC fields that support the $0 for identifiers, and the British Library is preparing a proposal to use $4 to specify relationships with identifiers.
These identifiers could point to non-library resources. For example, Wikidata already has identifiers for such roles as trumpeter, violinist, translator, librettist, and narrator. The task group’s focus has been on identifiers in bibliographic records because all catalogers can create bibliographic records while only a much smaller subset can create authority records. In some countries, only the national libraries create national authority records. Opportunities for batch enhancement of authorities are limited currently. Ideally, the bibliographic record would have a $0 URI pointing to a real-world object described by an authority record.
OCLC’s recent Person Entity Lookup pilot indicated how identifiers might impact authority workflows. By looking up a person and retrieving a number of identifiers, libraries could aggregate associated information from other authorities or sources having a “same as” relationship. For example, Wikidata shows that Noam Chomsky is affiliated with MIT, information that neither the LC/NAF authority file nor VIAF (Virtual International Authority File) includes. One of the most important—and powerful—aspects of adding identifiers is to reduce the amount of copying/pasting in the library environment when the identifier is stewarded elsewhere. Identifiers could provide a bridge between MARC and non-MARC environments and to non-library resources. Librarians wouldn’t have to be the experts in all domains.
Other potential areas of impact:
- Much journal literature is described by non-library agencies. Identifiers could link the forms of name in journal articles vs. scholarly profiling services vs. library catalogs, thus transcending currently siloed domains. This should also help catalogers disambiguate names more easily.
- Identifiers could provide links to digital collections and other resources that are not under authority control currently.
- Identifiers linking to other sources could allow us to present users with labels in non-Latin scripts for entitites that are represented only by romanization in our current authority files.
- In a linked data environment, identifiers could bypass authority records. Content negotiation could determine the preferred labels to display to the user. Ultimately, there could be much less emphasis on establishing an “authoritative text string”.
Tools mentioned during the discussions:
- Terry Reese’s MARCEdit (“Build Links Data” enhancements) and the editor produced by LC for its BIBFRAME project include lookups of remote authority services that allow incorporating a range of identifier schemes into cataloging workflows.
- The RIMFF (RDA in Many Metadata Formats) tool captures various attributes of an entity. Its focus is on concatenating elements the cataloger has selected rather than establishing an authorized access point. The application can decide what data to extract or display, such as an English or a Chinese language version.
- W3C SHACL (Shapes Constraint Language) helps define the shapes that our data will need, such as what types of descriptions we’ll want for various attributes. These could include the attributes catalogers might want to add to enhance an entity such as a missing birthplace, institutional affiliation or discipline.
- Catmandu is a data processing toolkit developed to build up digital libraries and research services.
Many challenges lie ahead. We’re going to need a larger vocabulary of relationships between entities. Libraries will want their book vendors to also include identifiers in the records they supply. We will still have many name authorities without dates or other attributes that cannot be matched by algorithms alone, still requiring human curation. It is unclear how libraries—or their support systems—will deal with multiple identifiers referring to the same object or resource. We need more editing tools that add URIs in the process of editing records. Libraries must educate their local systems vendors on the need for identifiers for both cataloging and discovery to avoid their stripping out the data added. Identifiers’ impact on authority workflows will depend on tools that don’t exist yet.
Karen Smith-Yoshimura, senior program officer, works on topics related to creating and managing metadata with a focus on large research libraries and multilingual requirements.