Backlogs and beyond: AI in primary cataloging workflows

This is the first post in a short blog series on what we learned from the OCLC RLP Managing AI in Metadata Workflows Working Group. This post was co-authored by Merrilee Proffitt and Annette Dortmund.

Libraries face persistent challenges in managing metadata, including backlogs of uncataloged resources, inconsistent legacy metadata, and difficulties in processing resources in languages and scripts for which there is not staff expertise. These issues limit discovery and strain staff capacity. At the same time, advances in artificial intelligence (AI) provide opportunities for streamlining workflows and amplifying human expertise—but how can AI assist cataloging staff in working more effectively?

To address these questions, the OCLC Research Library Partnership (RLP) formed the Managing AI in Metadata Workflows Working Group earlier this year. This group brought together metadata managers from around the globe to examine the opportunities and risks of integrating AI into their workflows. Their goal: to engage collective curiosity, identify key challenges, and empower libraries to make informed choices about how and when it is appropriate to adopt AI tools to enhance discovery, improve efficiency, and maintain the integrity of metadata practices.

This blog post—the first in a four-part series—focuses on one of the group’s critical workstreams: primary cataloging workflows. We share insights, recommendations, and open questions from the working group on how AI may address primary cataloging challenges, such as backlogs and metadata quality, all while keeping human expertise at the core of cataloging.

The “Primary Cataloging Workflows” group was the largest of our three workstreams, comprising seven participants from Australia, Canada, the United States, and the United Kingdom. Participants represented institutions in primarily English-speaking countries in which libraries may lack needed capacity to provide metadata for resources written in non-Latin scripts like Chinese and Arabic.

Jenn Colt, Cornell University	Chingmy Lam, University of Sydney
Elly Cope, University of Leeds	Yasha Razizadeh, New York University
Susan Dahl, University of Calgary	Cathy Weng, Princeton University
Michela Goodwin, National Library of Australia

Motivations: shared (and persistent) needs

Working group members are turning to AI to help solve a set of familiar cataloging challenges that result from a combination of resource constraints and limited access to specific skills. These challenges include:

Increasing cataloging efficiency
Improving legacy metadata
Obtaining assistance with resources in certain scripts where expertise is limited

Members of the working group assessed both the capabilities and limitations of AI tools in addressing these challenges by examining specific tools and workflows that could support this work.

Increasing cataloging efficiency

Backlogs of uncataloged resources prevent users from discovering valuable resources. Even experienced, dedicated staff are unable to keep up with the amount of resources awaiting description. AI offers the potential to address this problem by streamlining and accelerating the cataloging workflow for these materials. The working group identified key use cases of backlogs, including legal deposits, gifts, self-published resources, and those lacking ISBNs.

Copy cataloging is critical to addressing backlog issues, but the key challenge here is to identify the “best record.” Working group participants discussed how AI could streamline these workflows by automating record selection based on criteria such as the number of holdings or metadata completeness.

When original cataloging is required, AI-generated brief records for these materials can enable them to appear in discovery systems earlier, accelerating the process of making hidden collections discoverable and supporting local inventory control. This approach addresses the immediate need for discovery while allowing records to be completed, enriched, or refined over time.

Improving legacy metadata

Legacy metadata may contain errors, inconsistencies, or outdated terminology, which hinders discovery and fails to connect users with relevant resources. AI could assist with metadata cleanup and enrichment, reducing manual effort while maintaining high standards. This was an area where working group members had not experimented directly with AI tools, but could imagine a number of use cases, including:

Identifying and replacing outdated terms in existing metadata
Using AI tools to flag duplicates, diacritic errors, or anomalies to streamline cleanup efforts and improve data quality
Suggesting additional metadata fields or descriptions to enhance discovery
Supplying matching headings from local authority files to existing authorized headings or validated entities

Improving metadata quality, including reducing the number of duplicate records, has also been an area where OCLC has devoted considerable effort, including the development and use of human-informed machine learning processes, as illustrated in this recent blog post on “Scaling de-duplication in WorldCat: Balancing AI innovation with cataloging care.”

Providing support for scripts

Language and script expertise is a long-standing cataloging issue. In English-speaking countries, this manifests as difficulty describing resources written in languages using non-Latin scripts and those that are not often taught in local schools. AI tools could assist with transliteration, transcription, and language identification, enabling the more efficient processing of these materials. Some tools lack the basic functionality or support for specific, required languages. Even when AI tools confidently provide transliteration, human expertise is still very much required to evaluate AI-generated work. A library looking to AI to fill an expertise gap for these languages faces a double challenge of not fully trusting AI tools and also lacking access to internal language skills to effectively evaluate and correct its work.

Working group members brainstormed ways to address the needs in this situation. Research Libraries collect resources in dozens or even hundreds of languages to support established academic programs. Although the library may lack direct access to language proficiency, this expertise may be abundant across campus, with students, faculty, and researchers who are experts in the languages for whom hard-to-catalog resources are selected. These campus community members could help address a specific skill gap and safeguard the accuracy of AI-assisted workflows, while fostering community involvement and ensuring that humans are in the loop. In implementing such a program, libraries would need to create an engagement framework that includes rewards and incentives—such as compensation, course credit, or public acknowledgment—to encourage participation.

Open questions around the use of AI

Unsurprisingly, as with any new technology, opportunities come paired with questions and concerns. Metadata managers shared that some of their staff expressed uncertainty about adopting AI workflows, feeling they need more training and confidence-building support. Others wondered whether shifting from creating metadata to reviewing AI-generated records might make their work less engaging or meaningful.

Metadata managers themselves raised a particularly important question: If AI handles foundational tasks like creating brief records—work that traditionally serves as essential training for new catalogers—how do we ensure new professionals still develop the core skills they’ll need to effectively evaluate AI outputs?

These are important considerations as we explore the implementation of AI tools as amplifiers of human expertise, rather than replacements for it. The goal is to create primary cataloging workflows where AI manages routine tasks at scale, freeing qualified staff for higher-level work while preserving the meaningful aspects of metadata creation that make this field rewarding.

Conclusion

While not a panacea, AI offers significant potential to address primary cataloging challenges, including backlogs, support for scripts, and metadata cleanup. By adopting a pragmatic approach and emphasizing the continued relevance of human expertise, libraries can leverage AI with care to address current capacity issues that will make materials available more easily and improve discovery for users.

NB: As you might expect, AI technologies were used extensively throughout this project. We used a variety of tools—including Copilot, ChatGPT, and Claude—to summarize notes, recordings, and transcripts. These were useful for synthesizing insights for each of the three subgroups for quickly identifying the types of overarching themes described in this blog post.

Merrilee Proffitt

Merrilee Proffitt is Senior Manager for the OCLC RLP. She provides community development skills and expert support to institutions within the OCLC Research Library Partnership.