The coverage of Identity Management work

Relationships among jazz musicians visualized by linkedjazz.org

That was the topic discussed recently by OCLC Research Library Partners metadata managers, initiated by John Riemer of University of California, Los Angeles, Stephen Hearn of University of Minnesota, and MJ Han of University of Illinois at Urbana-Champaign. The emphasis in authority work has been shifting from construction of text strings to identity management—differentiating entities, creating identifiers, and establishing relationships between entities. Metadata managers agree that the future is in identity management and getting away from “managing text strings” as the basis of controlling headings in bibliographic records.

To support linked data, there is a need to maximize the number of entities in current descriptive work that have identifiers. The latest Program for Cooperative Cataloging strategic plan includes as a strategic direction “Accelerate the movement toward ubiquitous identifier creation and identity management at the network level.” A Technicalities opinion column Unpacking the Meaning of NACO Lite has been written in response to the call in action item 4.1 of the PCC strategic plan to further define what is meant by the concept.

Part of our discussion revolved on how “identity management” work differs from what catalogers currently do for authority work. The intellectual work required to differentiate names is the same. Is it really “add-on” work or just part of what metadata specialists have always done, but in a new environment? Identity management poses a change in focus, from providing access points for the resource to describing the entities represented by the resource (the work, persons, corporate bodies, places, etc.) and establishing the relationships and links among them. An example of a focus on relationship is illustrated by the above graphic from linkedjazz.org, visualizing the influences of jazz musicians on each other.

Authority work has focused on establishing a specific text string as the controlled access point and variant strings which redirect to it. Some systems index only the controlled access point. This can be particularly problematic when dealing with names in different languages. For example, the University of Hong Kong must deal with both simplified and traditional forms of Chinese characters to represent personal names, and the transliteration is not Pinyin but another scheme for representing the Cantonese, rather than the Mandarin, pronunciation. The Australian National University (ANU) similarly struggles with names of indigenous peoples who often have multiple spellings depending on context. Indexing only the authorized access point is not enough.

Suggestions for transitioning to identity management work included:

Develop identity management as new type of authority work and provide a venue for this type of contribution (a potential complement to traditional authority control).
Reorient traditional NACO authority control work, redirecting the energy toward identity management.
Align library practices with that other parties, e.g., rights management agencies, so that other centralized files can be used and shared.

A key barrier to moving from text strings to identity management is the lack of technology and infrastructure to support it. New tools are needed to index and display information about the entities described with links to the sources of the identifiers. The British Library hopes to take advantage of identifiers from other sources, especially those created by the authors themselves. Metadata managers aspire to reuse data from other communities as much as possible, as no one institution can create all the identifiers needed alone. Valued sources may vary depending on discipline, for example, Getty’s Union List of Artist Names may be most useful for art materials. Since there may well be multiple identifiers pointing to the same entity, we’ll also need tools to reconcile them. Technology developers won’t create the needed functionality when the data isn’t there. This chicken and egg problem hampers efforts to make this transition.

Several RLP metadata managers have been participating in the Program for Cooperative Cataloging’s ISNI pilot, a year-long effort to understand ISNI (International Standard Name Identifier) tools and identifiers, create documentation and training, and explore the possibility of using ISNI as a cost-effective component of the PCC’s Name Authority Cooperative Program (NACO). The results have been mixed, as participants struggled to learn a new system which lacked some of the attributes usually associated with authority records. ISNI itself is looking to facilitate individual contributions and streamline its batch-loading processes. ISNI continues to explore how it might include ORCIDs (Open Researcher and Contributor Identifiers). ORCIDs have become a common identifier for researchers not represented in authority files; for instance, ANU reports that 85% of its researchers now have ORCIDs.

Where do identifiers belong in bibliographic records? There is a temporary moratorium on adding identifiers to the 024 field in LC/NACO authority records, and not all systems yet differentiate or support the $0 subfield (URIs and control numbers that refer to “records describing things”) and the $1 subfield (URIs that directly refer to “the Thing”) in bibliographic records. Nor is there a common understanding of the differentiation—do authority records “describe things” or represent “the Thing”? OCLC recently implemented the $1 subfield for bibliographic records, and UCLA is scoping a pilot project on using the $1 to identify persons not covered by the LC/NACO authority file toward calculating the effort to achieve 100% identifier coverage.

Some posited Wikidata as another option for getting identifiers for names not represented in authority files which would also broaden the potential pool for contributors. Those who attended the 12 June 2018 OCLC Research Works in Progress Webinar: Introduction to Wikidata for Librarians rated it highly. A subsequent poll indicated the most interest among the OCLC Research Library Partnership in a “deeper dive” was to learn how other libraries are using Wikidata. The newly formed OCLC Research Library Partnership Wikimedia Interest Group may provide some good use cases. As Wikidata was developed by drawing data from Wikipedia, it has focused on “works” and their authors, which could be viewed as an alternate version of the traditional author/title entries in authority files. But recently an effort to support citations that are in Wikipedia articles, WikiCite, demonstrates that there is also a need to register and support identifiers that make up those citations, which would include information about a specific edition or document.

Redirecting the energy devoted to traditional authority work toward identity management poses the biggest hurdle. The linchpin is whether we can reconfigure our systems to deal with identifiers as the match point, collocation point, and the key to whatever associated labels we display and index.

Karen Smith-Yoshimura

Karen Smith-Yoshimura, senior program officer, topics related to creating and managing metadata with a focus on large research libraries and multilingual requirements. Karen retired from OCLC November 2020.