“Future Proofing” of Cataloging

Jessie Eastland, Moon in Sunrise Sky, Wikimedia Commons CC-BY-SA-3.0

That was the topic discussed recently by OCLC Research Library Partners metadata managers, initiated by Melanie Wacker of Columbia, Daniel Lovins of Yale and Roxanne Missingham of Australian National University. Metadata departments not only need to focus on current requirements for their metadata in the library catalog or repositories, but also need to ensure that they look ahead to future uses of their metadata in emerging services. The work of the PCC Task Group on URIs in MARC and the PCC ISNI Pilot are network-level efforts; involving metadata staff in academic projects, research data, or identity management tasks are examples taking place on the local level. As technologies change there will be new opportunities to unleash the power of our metadata in legacy records for future, different interactions and uses. Our cataloging heritage equips us to use metadata for revealing collections in new ways beyond our current systems.

Our discussions focused on identifiers, viewed as a transition bridge from legacy and current metadata to future applications. Although few identifiers are now leveraged as they could be, many institutions are adding ISNIs and FAST headings to their catalog records, and for records describing materials in Institutional Repositories, ORCIDs and DOIs. Incentives for researchers to use ORCIDs include facilitating the population of faculty research profiles, peer-to-peer networking, and automatically compiling their lists of publications regardless of their institutional affiliations at the time they were published. Australian National University views ORCIDs as the “glue” that holds together four arms of scholarly work—publishing, repository, library catalog, and researchers. Identifiers from the Library of Congress (e.g..,lccn and id.loc.gov) are commonly used in library catalog records; a number of institutions are also including identifiers for digital collections, photo archives, and archives.

Some institutions have contracted with third parties to add URIs to their MARC records but have discovered that if this is done by string-matching, there can be many false matches. Even if library systems do not yet make use of identifiers, third-parties such as Google Books and HathiTrust rely on identifiers for service integration. Embedding geo-coordinates in metadata or URIs supporting API calls to GeoNames can support map visualizations. FAST includes geo-graphic names that link to their geo-coordinates. The University of Minnesota is contributing to the Big Ten Academic Alliance’s Geospatial Data Project, which supports creating and aggregating metadata describing geospatial data resources and making them discoverable through an open source Geoportal.

Some metadata managers have been experimenting with creating entities in Wikidata, thereby minting Wikidata identifiers. Wikidata can be viewed as an identifier hub, which aggregates different identifier schemes pointing to the same object. In a recent OCLC Research Works in Progress webinar, Case Studies from Project Passage Focusing on Wikidata’s Multilingual Support, Xiaoli Li compared creating identifiers in Wikidata and the LC/NACO authority file and concluded that Wikidata offers more opportunities for providing richer information with more references and links to related entities.

One appealing use case for identifiers is to bridge the different systems (referred to as an “archipelago of systems”) used across an institution. Although we will likely continue to live in a world of multiple identifiers with varying degrees of overlap, hubs that can show “same as” relationships among different identifiers could support an infrastructure that brings together resources described by different data content standards. We see disciplinary differences in the way that scholars want to interact with and navigate data and outputs, and no one “identifier hub” will ever be comprehensive. A few are considering setting up local “data stores” to aggregate the identifiers and other metadata used across their different systems, such as Harvard’s Library Cloud. The British Library is working on a metadata model that would support the flow of metadata across all its systems.

In the meantime, libraries struggle to weigh the future potential benefits of identifiers with current workloads. Publishers serve as a key player in the metadata workstream, but publisher data does not currently include identifiers. The British Library is working with five UK publishers to add ISNIs to their metadata as a promising proof-of-concept. (The BL’s Cataloging-in-Publication records, created before a work’s publication, are disseminated with ISNIs.) The ability to batch-load or algorithmically add identifiers in the future is on metadata managers’ wish-list.

Everyone hopes that wider use of identifiers will provide the means for future systems to provide a richer user experience and increased discoverability on the semantic web. Let’s not just “get the web to come to us”: as libraries become more a part of the semantic web, metadata specialists will be freed from having to re-describe things already described elsewhere.

Karen Smith-Yoshimura

Karen Smith-Yoshimura, senior program officer, topics related to creating and managing metadata with a focus on large research libraries and multilingual requirements. Karen retired from OCLC November 2020.