Presenting metadata from different sources in discovery layers

Bento box display from the National Library of Australia’s Trove service

That was the topic discussed recently by OCLC Research Library Partners metadata managers, initiated by Daniel Lovins of Yale, Erin Grant of the University of Washington, and Jennifer Baxmeyer of Princeton. Most libraries have implemented “discovery layers” (e.g., Blacklight, Encore, Primo, Summon, VuFind, WorldCat Discovery) which channel metadata not just from local library systems, but also from archives, institutional repositories, digital collections and exhibitions, etc. Every time a new data source is added, the metadata need to be mapped to a common target schema and normalized with the other data sources to ensure a comprehensive view of all resources available at the institution while providing a consistent end-user experience.

Metadata from sources not included in discovery layers: Some have experimented including descriptions from Wikipedia or Wikidata, finding aids, institutional repositories, digitized collections, or full text collections. Many pulled out Wikipedia or Wikidata over concerns about their quality and overshadowing of scholarly sources. Full text also skews search results. Although finding aids are not usually included (the hierarchical data structure embedded of Encoded Archival Description is not well supported by discovery layers), a common practice is to include a link to the finding aid in the MARC collection-level record. Similarly, digitized collections including those from museums and exhibitions may not be included, but a link to the collection is often included in the MARC records describing them. Metadata for resources with access restrictions are usually excluded. Authority data is usually excluded, but may be part of the displays of library catalog data.

Not everyone includes metadata from their Institutional Repositories; those that do find the same item may be displayed multiple times because of the way “collections” are represented in the repository or duplicate feeds from aggregators that include the same work. Generally, research data metadata is not included.

Adding new sources: Different sources have their own schemas which need to be mapped to provide a “one stop searching” experience for end-users. The general trend is that the more libraries include cultural heritage materials, the more that they use Dublin Core as the common target schema, and bibliographic environments tend towards using Metadata Object Description Schema (MODS). New data sources are often decided across library departments, usually including metadata units. Rutgers University has a process where anyone who wishes to propose adding a new digital collection must first complete a form including the proposed schema or controlled vocabulary, vetted by the Digital Data Curator and the person overseeing digital projects. OCLC’s Digital Collection Gateway provides a way to add institutions’ new digital content to WorldCat.org and other discovery layers.

Relying on “bento box displays”— such as the one featured here from the National Library of Australia’s Trove service— allow showing results for materials that do not map well to the existing discovery indexes.

Our last discussion occurred after campuses had closed and most librarians had moved to working from home because of the COVID-19 pandemic. The pandemic pushed libraries to add new electronic sources to their discovery layers to support on-line instruction and research. HathiTrust members in the United States have taken advantage of its Emergency Temporary Access Service, making it possible for American library members to offer lawful access to digital materials that correspond to physical books they hold during this period of disruption. The HathiTrust members batch-add the links to the HathiTrust digital copy to the MARC record in their catalog using the OCLC record number. The University of Minnesota reported that it can now provide digital access through this service to about half of its physical collection. In Australia, some universities have negotiated permissions from publishers and rights holders to offer electronic copies of their works; some have also been able to increase or temporarily eliminate the cap on making electronic textbooks available. OCLC has leveraged its partnerships with publishers to provide extended and, in many cases, free-access to e-resources in its WorldCat knowledge base (see Chip Nilges’ March 2020 Next blog post.)

Challenges: Retrieval of metadata describing resources in completely different systems is “super challenging.” Names and concepts can vary greatly. Many resources are not under any authority control. Reconciling access points from various thesauri and metadata mapping work require technical services expertise. AI-augmented services such as Yewno have a role to play—tools that can ingest unstructured data and generate a knowledge graph with consistent labels for topics, persons, and places. Some libraries are experimenting with developing centralized data stores with reconciled metadata (perhaps with machine learning and OpenRefine) to populate the discovery layer that could improve retrieval of related items in their museums, archives, art galleries, and libraries. These experiments include exploring using a local Wikibase instance to store the reconciled metadata including variant terms and relationships expressed in authority files.

On the other hand, metadata managers pointed to the danger of collapsing the distinction between different thesauri. Mixing vocabularies may have unintended consequences. A diversity of different thesauri may have value in providing equitable access to everything. The biggest challenge is to empower the user to see the thesaurus designed specifically for the purpose. In a perfect world, the user would understand the distinct value of each thesaurus and be able to toggle to the one most relevant to the specific use case, while also having the results displayed from a comprehensive search.

Presenting metadata from different sources in discovery layers is a continuing struggle between “authority control” and “reconciling access points”, between “consistency” and “acknowledging diversity.”

Karen Smith-Yoshimura

Karen Smith-Yoshimura, senior program officer, topics related to creating and managing metadata with a focus on large research libraries and multilingual requirements. Karen retired from OCLC November 2020.

3 Comments on “Presenting metadata from different sources in discovery layers”

Andrew Padilla says:

April 18, 2020 at 7:52 am

Hi Karen,

Thanks for the articles titled “Presenting metadata from different sources in discovery layers”.

At the end of the article you write:

Presenting metadata from different sources in discovery layers is a continuing struggle between “authority control” and “reconciling access points”, between “consistency” and “acknowledging diversity.”

Can you elaborate the terms in quotes above? What do they mean?

Thank you,
1. Karen Smith-Yoshimura says:
  
  April 19, 2020 at 6:09 pm
  
  Hi, Andrew. “Authority control” refers to the desire to use a single, distinct, preferred form of names and subjects for everything that appears in the catalog. Since different sources can use different terms for the same “thing”, work is needed to reconcile the differences. The result is the tension between providing consistency among the entities and topics described while acknowledging that different communities and disciplines have their own preferred terminology.
  1. Andrew Padilla says:
    
    April 24, 2020 at 10:45 pm
    
    thanks for the clarification Karen.
    
    in another article post here titled “knowledge organization systems” you stated the following:
    
    ”Rather than one “global domain,” perhaps the library community could provide added value by adding bridges from the metadata in library domain databases to other domains“
    
    This statement is very much aligned with “domain driven design” philosophy in the software development world. i am curious if this thought was inspired by some familiarity with this or happened to be a result of convergent thinking ? i believe they would refer to maintaining multiple domains as “bounded contexts” and your reference to bridging would be called “context mapping” in their terminology .
    
    great posts ! learning much here!

Comments are closed.