The OCLC RLP Metadata Managers Focus Group met in March 2023 to explore new developments in the shift from “authority control” towards “identity management.” Our discussion was facilitated by Charlene Chou of New York University, Joy Panigabutra-Roberts of the University of Tennessee, and John Riemer of UCLA.
This shift is a familiar topic for Metadata Managers and was a prominent feature of Karen Smith-Yoshimura’s Transitioning to the Next Generation of Metadata report and in multiple posts here on Hanging Together (read more about identity management on Hanging Together).
The current work of the Identity Management Advisory Committee (IMAC) of the Program for Cooperative Cataloging (PCC) returns us to this topic now. The PCC Timeline for Implementing Identity Management in Multiple Registries explores what steps the PCC may need to take to use identity management registries beyond the LC NACO authority file (LC/NAF) in PCC cataloging workflows. We asked members of the Metadata Managers Focus Group to tell us:
- What lessons they’ve learned from using other identity management platforms in their workflows
- How are they incorporating non-LC/NAF sources into their local cataloging workflows
- What the opportunities and threats are for expanding PCC cataloging beyond the LC/NAF
We received responses from 11 members of the group and had further discussions during two virtual sessions.
Summary
One of my favorite quotes from science fiction author William Gibson — “The future is here; it’s just not evenly distributed yet”—aptly describes much of our conversation about identity management outside of LC/NAF. Organizations with staff resources and platforms ready to integrate with non-LC/NAF identifiers are doing so when those entities are fit for purpose. This is especially true when looking beyond the catalog and into research information management and/or digital asset management of unique collections. Librarians who have engaged with ORCID, SNAC (Social Networks and Archival Context), Wikidata, and other identity management platforms find that the benefits can outweigh their anxieties about working outside of NACO files. These alternatives do not diminish the time and intellectual work that goes into managing identities. However, these alternatives’ clear governance policies and technical affordances can make the work more efficient.
For the many libraries not in the vanguard, broader adoption of identity management beyond the LC/NAF requires workflow safeguards and technological solutions. Our discussions indicate that librarians succeed most when URIs or other persistent identifiers allow them to make connections across multiple platforms. Changes in the MARC 21 formats provide a mechanism for recording this data, but these will only be valuable if corresponding systems and workflows can use incorporated identifiers. Our conversations also demonstrated the value of PCC pilot projects in helping us focus on solutions that work.
Key takeaways
We’re already doing this in various environments. Respondents noted that in some environments, this is not a new development. With the growth of the adoption of persistent identifier systems (PIDs), such as ORCID and ROR, libraries have been incorporating other identity management sources when describing resources in institutional repositories and ETD workflows. These PIDs continue to fulfill some of linked data’s promise by being the “glue” between library catalogs, repositories, and research information management (RIM)/current research information systems (CRIS).
Participants discussed the benefits of alternative identity management sources for special and distinctive collections where the managed identities fall outside the current requirements for establishing NACO headings. While some organizations are participating in Social Networks and Archival Context (SNAC), many respondents noted their increased use of Wikidata, especially in digital repositories.
Workflows are becoming clearer. Thanks in large part to pilot projects, such as the PCC Wikidata Pilot, the PCC ISNI Pilot, and the PCC URIs in MARC Pilot, libraries are finding ways to incorporate other identity management platforms into their workflows. For example, NYU noted they routinely include 024 (other standard identifiers) fields in authority records with ISNI, VIAF, and Wikidata identifiers and also others like ORCID and Union List of Artist Names (ULAN) when appropriate for the person described.
Using the lessons learned in these pilots, Columbia has also developed a new workflow “…to allow catalogers without NACO training to create Wikidata items for persons. These Wikidata items then could be easily converted into ‘lite’ NACO records by NACO-certified catalogers.” (https://blogs.cul.columbia.edu/tsl/2021/06/08/pcc-wikidata-pilot-expanding-authority-control-to-identity-management/)
The University of Washington has continued a project begun under the PCC Wikidata Pilot that resulted in Wikidata application profiles for Faculty and Staff and Graduate Students. Once a Wikidata entity is created, these URIs are directly incorporated into MARC records for electronic thesis and dissertations (ETD).
035 __ |a (OCoLC)1100482956
100 1_ |a Brewer, Aaron W., |e author |1 http://www.wikidata.org/entity/Q91591340
700 1_ |a Teng, Fang-Zhen, |e degree supervisor |1 http://www.wikidata.org/entity/Q81485621
Going beyond LC authority files can allow catalogers to remediate problematic gaps. But it also comes with risks.
Several participants noted that using vocabularies outside the LC/NAF and LCSH allowed them to use culturally relevant names for Indigenous communities. At the University of Chicago, archivists who previously only used LC Subject Headings for cultural group names are exploring identity management options that allow them to circumvent problematic terms. “By utilizing controlled vocabularies created and maintained by subject experts who work with Indigenous communities, the archivists hope to make records more discoverable for the people whose cultures and ancestors are described…. Archivists select a vocabulary to use based on which has the most subject expert input for Indigenous communities in the particular geographic region… For example, [the] finding aid to the Gerhardt Laves Papers. Here, an archives processing student assistant corrected Laves’ spelling of several Australian Aboriginal cultures and languages based on AustLang. For instance, we used AustLang to identify the correct spelling for the Karajarri community (spelling in Laves’ materials Karadjeri) and the Yawuru community (spelled in Laves’ materials Yuwari).”
Similar efforts are underway at the University of Sydney. “We apply Aboriginal and Torres Strait Islander people headings from the AustLang database in selected theses records in our Institutional Repository….We [also] use BlackWords in AustLit and the Aboriginal and Torres Strait Islander Biographical Index (ABI) to ascertain the heritage of the Aboriginal and/or Torres Strait Islander authors and then add notes in the metadata to highlight their heritage.” View example record. At the same time, these examples highlight the promises and perils of moving outside of known rules of LC/NAF.
While these examples use a trusted, authoritative source, another participant noted: “There are also ethical concerns—how much faith can we put in each service to respect the privacy and dignity of the people being described? Can we control/remediate any harm being done in the same way that we can in the NAF?” Another participant emphasizes that “…well-meaning majority-white institutions might unintentionally expose individuals from historically marginalized groups to harm or harassment, and majority identity library workers sometimes lack enough cultural awareness to accurately describe or label individuals from underrepresented groups.”
Trust, efficiency, lower costs, technical affordances, and good governance policies and procedures are what we’re looking for in new identity management environments.
We asked respondents, “what criteria do you consider important in selecting an identity management source.”
Trust. Above all, librarians feel that any identity management services they will use need to be trusted. This is not only because the information a service provides is current and correct, but it is offered through stable endpoints that can be relied on in production environments. Especially for inclusion in linked-data records, having persistent URIs is essential.
Currently, we place a great deal of trust in the Library of Congress and NACO-trained catalogers because of the strong governance model they provide. This community provides mutual support, documentation, and training to ensure the quality of its authority file. While it is unlikely other identity management services will replicate this quality at the same level, having a clear governance structure is an important consideration. For example:
- Do its values reflect those of the library community?
- Is there a clear support channel for technical help and/or to report errors/problems?
- Is it inclusive of different communities/individuals being identified?
- Does it have mechanisms in place to prevent or mitigate the harms from vandalism?
Efficiency. Librarians’ interest in alternative identity management often comes back to how they can make their work more efficient. Identity management sources that are not comprehensive in the area of description are not up to date or are inconsistent in their coverage, making poor targets for inclusion. Instead, our participants sought out sources that were fit for purpose (e.g. ORCID for scholars, Discogs for music, IMDB for movies, or specialized resources like AustLang, etc.).
An identity management source should also allow workflows to flow smoothly. Several contributors noted that the advantages of Wikidata were the low barriers to entry, timeliness of URI generation, and ease of contributions/updates. Participants recognize these features sit in tension with the desire for trusted and efficient targets because it results in duplicates that need to be disambiguated or merged. Valued services provide additional structured properties that aid in disambiguation beyond just looking for closely matching string labels.
Respondents stressed that non-LC/NAF authorities do not necessarily reduce the time and effort inherent in identity management. Instead, their focus was on the capabilities of these platforms, which allowed them to focus on the intellectual aspects of identity management and less on data management. There is also concern that non-LC/NAF sources could exacerbate problems with duplicates.
Technical affordances. Many of our participants also saw value in the modern technical infrastructures provided by the alternatives to LC/NAF, especially when using linked data.
Participants indicated that a criteria for adopting a new identity management platform includes:
- conformance to basic Linked Data principles
- ability to be serialized in multiple ways, such as JSON-LD, CSV, etc.
- support multiple languages and scripts and use language tags to localize preferred labels
Many linked-data-ready services also increase their value to libraries by offering an API, SPARQL endpoint, or OpenRefine reconciliation service. This allows metadata creators to work in batches and makes these services more efficient at larger scales. This is especially true when combined with tools and techniques that help disambiguate or cluster similar name strings and/or entity properties. Libraries still operating in older systems may find it more difficult to integrate these services. However, modern library service platforms (LSPs), repositories, and DAMs are increasingly adapting to include these sources.
The use of alternatives is not limited to identity management services that are based on library-specific data models. However, those services that have a close alignment to library models are more readily integrated into existing workflows. Identity management services with a clear, consistent, and (relatively) stable data model are preferred.
The road ahead
Metadata Managers have many aspirations that are tempered by concerns about what going beyond the LC/NAF means for identity management.
Using other platforms can increase the inclusion of entities that don’t neatly fit into current bibliographic practices. This can allow us to address current harms and further align our practices with the needs of specific communities. Loosening the control we have on authority record workflows means we can also invite more people into the process in ways that augment library expertise and result in the creation of more identifiers/greater coverage of the entities in bibliographic data. At the same time, we recognize there are dangers in possibly diluting our resources with lower-quality entities that create more work for librarians.
Libraries hope that new identity management platforms can be implemented in ways that lower costs. This may come through increased efficiencies or by being able to rely on less-demanding training requirements needed for staff to perform identity management tasks. Many of the emerging alternatives are also free to use without a direct cost to the library that uses them. Instead, the costs are borne by other agents within the metadata ecosystem—whether that’s through memberships in collaboratives like CrossRef, ORCID, SNAC, or through philanthropic funding that supports the Wikimedia Foundation/Wikidata. Librarians are also wary that we could invest time, labor, and intellectual efforts that contribute to identity management environments outside of library control. If there is value in this work, it could be captured in the future if the terms and conditions of use change to more closed models.
What do these conversations mean for OCLC? In my next post, I’ll interview Jeff Mixter to learn more about how he’s taken his work on OCLC Research pilot projects into developing production services for WorldCat Entities and linked data.
Hi Patricia!
I went back through the responses for this session to confirm I hadn’t missed something. Only one response mentioned Getty vocabularies by name. This was a fairly small group and I’d expect to see them in a wider set of responses. Certainly, Getty vocabularies have exactly the kind of governance and reliability that metadata managers are looking for when choosing additional vocabularies.
Thanks for re-sharing the link to your DEI presentation!
Within the Getty Vocabularies, ULAN is problematic for identity management. For artists, it’s ok. But for art dealers, galleries and other owners, its quality as a linked data source is poor. It mixes in the most confusing way persons with organisations so that matching and linking with other better managed sources is problematic. It even groups several persons in a single ULAN record for a “person” marked “composite record”. (see: http://vocab.getty.edu/page/ulan/500440140 ). URIs are not a stable as they should be. It also omits far too many names to be a useful source in areas such as Nazi-looted art. This is an ongoing problem which deserves to be analysed in depth.
Thanks, Richard, for the interesting article! We hope that the ULAN and the other Getty Vocabularies could increasingly be used in addition to other authority resources by the Library community. It seems that coreferences between resources is our path to a more inclusive future. At a few OCLC venues, we’ve been giving updates on our own efforts to increase multicultural, unbiased, and inclusive content of the Getty Vocabularies, e.g., https://www.getty.edu/research/tools/vocabularies/Vocabs_unbiased_terminology.pdf Patricia Harpring, Managing Editor, Getty Vocabularies