Metadata for digital objects

That was the topic discussed recently by OCLC Research Library Partners metadata managers. It was initiated by Jonathan LeBreton of Temple, who noted the questions staff raised when describing voluminous image collections such as: Do we share the metadata even if it would swamp results? What context can be provided economically? What are others doing both in terms of data schemas and where the metadata is shared?

The discussion revolved around these themes:

Challenges in addressing the sheer volume of digital materials.  Managers are making decisions based on staffing, subject expertise, collection’s importance and funding. It was suggested that some metadata could be extracted from the technical metadata, such as dates and location. We discussed the possibility of crowd-sourcing metadata creation, although experience to date is that a few volunteers are responsible for most contributions, and the successful examples tend to be for transcription, editing OCR’d text, and categorizations. (The At a Glance: Sites that Support Social Metadata chart indicates the ones that enhance data either through improved description or subject access.) The context must matter to people for them to volunteer their efforts. (See the OCLC Report, Social Metadata for Libraries, Archives and Museums: Executive Summary.) With the anticipated increase of born-digital and other digitized materials, there’s a greater need for batch and bulk processing.

Grappling with born-digital materials.  Libraries are receiving the digital equivalents of personal papers and using the Forensic Toolkit to “process” these digital collections.  Preservation and rights management, in addition to description, are important components and no commercially available system yet addresses these needs. The Association of Research Libraries is working with the Society of American Archivists to customize its Digital Archives Specialist (DAS) Program to develop the requisite skills for managing born-digital for ARL library staff. OCLC Research has produced several reports in conjunction with its Demystifying Born Digital program of work.

Concerns about “siloization”, or proliferation of “boutique” collections, using different metadata schema. Metadata is being created in different native systems within an institution, metadata that is often not loaded into a central catalog or even accessible in the local discovery layer. User-created metadata in institutional repositories may be OAI harvested by OCLC and thus may appear in WorldCat even if not visible in the institution’s local discovery tool. Managers grapple with whether to spend resources on updating such metadata before it is exposed for harvesting.  Another challenge is deciding what to include in which discovery layer, and what should be silo’d.  The numerous repositories within an institution can result in complex metadata flows for discovery, as illustrated by UC San Diego’s Prezi diagram. Some institutions map their various metadata schema to MODS (Metadata Object Description Schema), but all non-MARC metadata is converted to MARC when loaded into WorldCat.

What are the “essential elements” to provide access across collections? We posited that librarians have been discussing “core” or “essential” metadata elements for decades, starting with Dublin Core and the Program for Cooperative Cataloging’s “BIBCO Standard Record”. Librarians have been entering metadata for the system it was designed for, but then ultimately the data moves to another system later.  Library metadata is no longer confined to a single system: it may be exposed to search engines and viewed with lots of non-library metadata.

The Library of Congress’ Bibliographic Framework Initiative  portends a future where all metadata will be “non-MARC” and we will rely more on linked data URIs in place of metadata text strings.  How can we use the promise of that future to get to where we need to be?