Since the public release of ChatGPT in 2022, the library community has been buzzing about the latest incarnation of artificial intelligence tools. The OCLC Research Library Partnership Metadata Managers Focus Group met in April 2024 to share how they are getting ready for this new wave of AI in their libraries. During our conversation, the Metadata Managers shared how they are responding to the internal and external forces emerging from the conversations around AI. Their approaches fall into three broad themes:
- Learning together collaboratively
- Engaging staff curiosity about AI-augmented metadata workflows
- Thinking responsibly about implications
Learning together collaboratively
Because large language models (LLMs) have been an important focus on campuses due to their implications for research and instruction, participants noted that their learning often starts in cross-campus learning environments. As universities have established policies about using AI or launched enterprise AI services, they have also created generalized professional development opportunities where librarians can get oriented. When libraries enter this space, they often do so from the perspective of public-facing user services like reference and research support. On some campuses, the library has embraced its traditional role of providing students with basic information literacy skills that now include how to use LLMs in their coursework. Subject librarians are also gaining familiarity with emerging AI-powered research tools from scholarly journal providers to help researchers take advantage of them.
Because these learning resources don’t address the concerns of catalogers and metadata creators, Metadata Managers participants are getting creative about orienting staff toward the opportunities. An AI Study Group was created by New York University Libraries to share articles and create a safe space for staff to experiment with AI tools. The group holds space for staff lightning talks to share small-scale experiments and explorations with AI tools. A key purpose of this group is to overcome some of the trepidation created by broader discussions of AI, especially about the displacement of cataloger/metadata creator expertise.
Many of the Focus Group members noted that they were “just getting started” and were looking for resources to help them understand AI’s impacts. Several projects emerged from the discussion that members are watching:
- AI for Libraries, Archives, and Museums (AI4LAM) includes resources, a Slack community, and regular community calls to discuss the use of AI. The group also organizes the Fantastic Futures conference (which will be held in Canberra, Australia in October 2024).
- The Program for Cooperative Cataloging (PCC) Task Group on Strategic Planning for AI and Machine Learning has issued a preliminary report and set of learning resources.
- IFLA’s Artificial Intelligence Special Interest Group
- ARL/CNI Joint Task Force on Scenario Planning for AI and Machine-Learning Futures, a six-month initiative, began releasing the results of polls and conversations about guiding principles for AI that can be found on the ARL website.
- Leading projects from national-level libraries, such as the LC Labs AI Framework (see the OCLC Works in Progress session) and BL Labs Symposium on AI and GLAM data.
Engaging staff curiosity
Metadata Managers are actively looking for opportunities to experiment with AI tools. First steps often come through enterprise-level services that parent institutions are providing. Even though these tools offer generalized LLMs, they can be immediately helpful for common tasks. For example, staff at Cornell University used their campus-provided Microsoft Copilot to upskill in key areas to build capacity around coding and SQL. Staff at Brandeis have also used AI to help build new XSLT (eXtensible Stylesheet Language Transformations) that cleans up MODS (Metadata Object Description Schema) records.
Josh Hutchinson at the University of Southern California found a way to make lemonade out of ChatGPT lemons. Several participants noted that LLMs can hallucinate incorrect DOIs, LCCNs, or other identifiers when asked to generate MARC records. Josh had students in a cataloging class use ChatGPT to generate records, then use lessons from class to review and correct mistakes in the records and learn better cataloging practices in the process.
Focus group members expressed interest in the possibilities of AI to help with metadata creation and quality control, both directly and indirectly. One area of success has been breaking down cataloging tasks into targeted areas rather than generating entire MARC records in one shot. This approach has helped limit hallucinations where chatbots are compelled to fill in the blanks, even with incorrect information. At Princeton, faculty are experimenting with AI to help create tables of contents in records (TOC—MARC 505). Similar work is underway at the National Library of Medicine, where ChatGPT has been used to improve Electronic Cataloging in Publication (ECIP) records. Participants speculated whether AI will introduce more quality issues, especially for vendor-supplied e-resources records. Metadata Managers look forward to learning more about OCLC’s efforts in this area, such as the recent project to reduce duplicates through machine-learning models.
As part of the LUX project, Yale University is working with an outside vendor to develop a vector database solution for entity reconciliation. This process extracts entities from MARC/BIBFRAME metadata and tests the prototype’s ability to provide reliable entity lookups across different areas. The team is also exploring whether ChatGPT can help ungarble location data from MARC subfields when translated into linked data.
Metadata managers are also exploring the potential of AI to solve adjacent metadata problems. At the University of Sydney and University of Pittsburgh, staff are exploring how to augment metadata for special collections using AI-powered tools like Transkribus. At the University of Arizona, members of the ReDATA repository are exploring how to better encourage researchers who submit data sets to provide enriched documentation to assist in metadata creation. The hope is that machine learning/AI tools could potentially help self-depositors with the creation of keywords and summaries.
Thinking responsibly
With great power comes great responsibility. Metadata managers recognize that AI has tradeoffs and are incorporating professional ethics into their thoughts about responsible use.
Libraries have been at the forefront of advocating for open access to research, images, and metadata. However, many librarians are considering what this means in the face of large amounts of data consumed by LLMs and other generative AI platforms for commercial uses. Releasing images, texts, or metadata under open-access licenses (like CC0) can attract AI bots that generate high traffic levels and degrade normal library services. Some libraries are exploring how they might mitigate these impacts through technical barriers. Others questioned the ethics of contributing their labor to open data sets like Wikidata, when it can be scraped and monetized in ways that were not previously envisioned. At the same time, participants noted that local constituents also seek access to datasets to pursue AI-related research. This has raised questions about license updates from electronic resource providers that prohibit the use of purchased content for text mining, machine learning, or AI.
Librarians are also sensitive to questions about the copyrights they steward in local repositories. Beyond service impacts, the Metadata Managers are considering what it means to make local repositories and research available for consumption by generative AI tools in responsible ways. The British Library’s Digital Scholarship team has confronted these issues by hosting conversations with authors, publishers, librarians, and intellectual property experts (see Safeguarding Tomorrow: The Impact of AI on Media and Information Industries).
Publishers are also leveraging artificial intelligence to create new works. Joy Panigabutra-Roberts from the University of Tennessee-Knoxville brought several examples of works created using generative AI tools. This has created some debate among catalogers challenging the status of agents in the context of bibliographic rules. (Currently, the PCC guidance considers LLMs as kinds of software.) On one hand, there is a desire to let users know that a work may be the product of AI, while also not wanting to elevate an LLM to the status of an author.
These concerns reflect issues that surfaced during the recent OCLC-LIBER Building for the Future series (see OCLC Research’s summary here: Imagining library futures using AI and machine learning), but with a particular focus on the specific concerns entailed by working in metadata spaces.
Next steps for the Metadata Managers Focus Group
Amy Bruckmen recently recalled the struggles of the “AI winter” after earlier promises of rapid progress in artificial intelligence fizzled. Today, we bask in the warmth of a new “AI summer” where attention and funding for AI are plentiful. The Metadata Managers community wants to know how best to take advantage of the moment while not diverting attention from other vital tasks. The Planning Group will be evaluating how the Metadata Managers Focus Group can continue to support community efforts through:
- Developing shared use cases and best practices for responsible application of AI to metadata problems, modeled on the OCLC RLP SHARES Best Practices and recommendations in OCLC Research’s Responsible Operations: Data Science, Machine Learning, and AI in Libraries.
- Collaborating on training materials that focus on metadata applications (as opposed to general AI use cases in public service).
- Working towards a shared understanding of which AI tools are useful for specific metadata workflows and tasks.