Exploring AI uses in archives and special collections: Integration, entities, and addressing need

This is the second in a short blog series on what we learned from the OCLC RLP Managing AI in Metadata Workflows Working Group.

A vibrant digital illustration featuring the name "Mary Shelley" repeated in various languages and scripts, overlaid on a glowing network of interconnected nodes and lines. The colorful nodes emit light in hues of yellow, pink, green, and blue, symbolizing data connections and metadata integration.

Archives and special collections contain a wide range of resource types requiring different metadata workflows. Resources may be described in library catalogs, digital repositories, or finding aids, and the metadata can vary greatly because of platforms, collections priorities, and institutional policies. Providing online access and discovery for these unique resources presents an ongoing challenge because of inconsistent or incomplete metadata and new digital accessibility standards. AI presents new possibilities for providing access to unique resources in archives and special collections, where it may be used for data—like captions and transcriptions—relying on the strengths of large language models (LLMs).

This blog post—the second in our series on the work of the OCLC Research Library Partnership (RLP) Managing AI in Metadata Workflows Working Group—focuses on the “Metadata for Special and Distinctive Collections” workstream. It shares current uses of AI by members, insights on assessing whether AI is suitable for a task, and open questions about accuracy and data provenance.

Participants

This workstream brought together metadata professionals from diverse institutions, including academic libraries, national archives, and museums. Their collective expertise and the use cases they shared provided valuable insights into how AI tools can address the unique challenges of special and distinctive collections. Members of this group included:

Helen Baer, Colorado State University	Jill Reilly, National Archives and Records Administration
Amanda Harlan, Nelson-Atkins Museum of Art	Mia Ridge, British Library
Miloche Kottman, University of Kansas	Tim Thompson, Yale University

Integration in existing tools

Participants primarily described using tools already available to them through existing licensing agreements with their parent institution. While this works for proof-of-concept experimentation, these ad hoc approaches do not scale up to production levels or provide the desired increases in efficiency. Participants expressed that they want integrated tools within the library workflow products they are already using.

Using multiple tools is a long-standing feature of metadata work. In the days of catalog cards, a cataloger might have a bookcase full of LCSH volumes (i.e., the big red books), LCC volumes, AACR2, LCRIs, a few language dictionaries, a few binders of local policy documents, and, of course, a typewriter manual. Today, a cataloger may have four or five applications open on their computer, including a browser with several tabs. Working with digital collections compounds this complexity, requiring additional tools for content management, file editing, and project tracking. Since AI has already been integrated into several popular applications, including search engines, metadata managers hope to see similar functionality embedded within their existing workflows, potentially reducing the burden of managing so many passwords, windows, and tabs.

Entity management

Many metadata managers, including our subgroup members, dream of automated reconciliation against existing entity databases. This becomes even more important for archives, which often contain collections of family papers with multiple members with the same names. A participant observed that URIs are preferable for disambiguation due to the requirement to create unique authorized access points for persons using a limited set of data elements. The natural question then becomes, “How can AI help us do this?”

Yale University’s case study explored this question, noting that it used AI in combination with many other tools, as using an LLM for this work would have been prohibitively expensive. The technology stack is shared in the entity resolution pipeline and includes a purpose-built vector database for text embeddings. The results included a 99% precision rate in determining whether two bibliographic records with different headings (e.g., “Schubert, Franz” and “Schubert, Franz, 1797-1828”) referred to the same person and did not make traditional string match errors that occur when identical name strings refer to different persons. This case study demonstrated how AI could be effectively used in combination with multiple tools, but it may also require technical expertise beyond that of many librarians and archivists.

Readiness and need

All participants indicated some level of organizational interest in experimenting with AI to address current metadata needs. Due to distinct workflows and operations common in special collections and archives, there were fewer concerns about AI replacing human expertise than in the general cataloging subgroup.

We identified three factors influencing their willingness to experiment with AI:

Traditional divisions of labor
Quantity of resources to be described
Meeting accessibility requirements

Traditional divisions of work

In archival work, item-level description elements, such as image captions and transcripts, have often been done selectively by volunteers and student workers rather than metadata professionals due to the volume of items and the lack of specialized skills needed.* For example, the United States’ National Archives and Records Administration (NARA) relies on its Citizen Archivist volunteer program to provide tagging and transcription of digitized resources. Even with these dedicated volunteers, NARA uses AI-generated descriptions because of the extensive number of resources. However, NARA’s volunteers provide quality control on the AI-generated metadata, and the amount of metadata generated by AI ensures that these volunteers continue to be needed and appreciated.

Quantity of resources

Archival collections may range from a single item to several thousand items, resulting in significant variation in the type and level of description provided. Collection contents are often summarized with statements such as “45 linear feet,” “mostly typescripts,” and “several pamphlets in French.” However, when collections are digitized, more granular description is required to support discovery and access. The workflow at NARA is a good demonstration of how an archive uses AI to provide description at a scale that is not feasible for humans. Many archivists have been open to the idea of using AI for these tasks because the quantity of resources meant that detailed metadata was not possible.

Meeting accessibility requirements

Accessibility is a growing priority for libraries and archives, driven by legal requirements such as the ADA Title II compliance deadline in the US. For digital collections, this may mean providing alt text for images, embedded captions and audio descriptions for video recordings, and full transcripts for audio recordings.

A participant observed that, in their experience with AI-generated transcripts, AI does well transcribing single-language, spoken word recordings. However, the additional nuances with singing and multiple-language recordings are too complex for AI. This provides a natural triage for audio transcript workflows in their institution.

Creating transcripts of audio recordings is time-consuming, and archives have largely relied on student workers and volunteers for this work. Many institutions have a backlog of recordings with no transcriptions available. Thus, using AI for transcripts enables them to meet accessibility requirements and increase discovery of these resources.

Challenges and open questions around the use of AI

While AI offers opportunities, the group also identified several challenges and open questions that must be addressed for successful implementation. Metadata quality and data provenance were the top issues emerging for special and distinctive collections.

Assessing metadata quality

What is an acceptable error rate for AI-generated metadata? Participants noted that while perfection is unattainable, even for human catalogers, institutions need clear benchmarks for evaluating AI outputs. Research providing comparative studies of error rates between AI and professional catalogers would prove valuable for informing AI adoption decisions, but few such findings currently exist. High precision remains critical for maintaining quality in library catalogs, as misidentification of an entity will provide users with incorrect information about a resource.

The subgroup also discussed the concept of “accuracy” in transcription. For instance, AI-generated transcripts may be more literal, while human transcribers often adjust formatting to improve context and readability. An example from NARA showing a volunteer-created transcription and the AI data (labeled as “Extracted Text”) illustrates these differences. The human transcription moves the name “Lily Doyle Dunlap” to the same line as “Mrs.”, but the AI transcribes line by line. While the human transcriber noted untranscribed text as “[illegible],” the AI transcribed it as “A.” Neither reflects what was written, so both could be described as not completely accurate. Unlike cataloging metadata, there has never been an expectation that transcriptions of documents or audiovisual records would be perfect in all cases for various reasons, including handwriting legibility and audio quality. One participant characterized their expectations for AI-generated transcripts as “needed to be good, but not perfect.”

One case study used confidence scores as a metric to determine whether the AI-generated metadata should be provided to users without review. Confidence scores provide a numerical value indicating the probability that the AI output is correct. For example, a value of over 70% might be set as a threshold for providing data without review. Because confidence scores are provided by the models themselves, they are as much a reflection of the model’s training as its output.

Providing data provenance

Data provenance—the story of how metadata is created—is a critical concern for AI-generated outputs. Given the risk of AI “hallucinations” (generating incorrect or fabricated data), it is important to provide information to users about AI-created metadata. Working group members whose institutions are currently providing such data provenance shared their practices. NARA indicates that a document transcript is AI-generated using the standard text “Contributed by FamilySearch NARA Partner AI / Machine-Generated” (see this example for extracted text of a printed and handwritten document).

OCLC recognizes the importance of this issue to the community and is providing support in these ways:

Updated WorldCat documentation:
Section 3.5 of the Bibliographic Formats and Standards (BFAS) now includes guidance on recording AI-generated metadata.
AskQC Office Hours webinar:
The August 2025 session focused on providing data provenance in bibliographic records, including AI use cases.
Collaboration on principles and best practices:
OCLC is participating in the Program for Cooperative Cataloging’s Task Group on AI and Machine Learning for Cataloging and Metadata to develop guiding principles and best practices for using AI in metadata work.

Conclusion

Metadata professionals have a long-standing interest in the use of automation to provide and improve metadata, and AI joins macros, controlling headings, and batch updates as the latest technology tool in this effort. Our subgroup’s case studies demonstrated that AI tools can be used in special collections workflows in cases where AI is well-suited to the metadata needed. The most compelling applications involved transcribing documents and recordings, where AI capabilities, such as automatic speech recognition (ASR) and natural language processing (NLP), make it a good fit for such tasks.

NB: As you might expect, AI technologies were used extensively throughout this project. We used a variety of tools—Copilot, ChatGPT, and Claude—to summarize notes, recordings, and transcripts. These were useful for synthesizing insights for each of the three subgroups and for quickly identifying the types of overarching themes described in this blog post.

*It is worth noting that the labor available to national and university archives includes volunteers and student workers, whereas a smaller stand-alone archive like a historical society would not have access to so many human resources.

Kate James

Kate James is the Program Coordinator, Metadata Engagement, in OCLC Global Product Management. Her favorite RDA entity is Nomen, and her favorite LC class number is SF429.C3.