Skip to content

Hanging Together

the OCLC Research blog

  • Home
  • About
  • Subscribe to Hanging Together
  • Cookies used on the OCLC.org website
Main Menu
Digitization / Linked Data / Metadata / Research Library Partnership

Sharing digital collections workflows

November 2, 2016February 15, 2024 - by Karen Smith-Yoshimura

digital-collections-graphic-for-ht-blog-2016-10

That was the topic discussed recently by OCLC Research Library Partners metadata managers, initiated by Roxanne Missingham of Australian National University and Dawn Hale of Johns Hopkins University. Originally entitled “workflows associated with sharing digital collections (both born-digital and digitized)”, the topic arose from institutions’ increasingly sharing the metadata for their digital collections with both national and international discovery services. Within individual organizations, librarians create and recreate metadata for digital and digitized resources in a plethora of systems—the library catalog, archive management, digital asset and preservation systems, the institutional repository, research management systems and external subscription-based repositories. Targets for sharing this metadata range from tailored topic-based digital discovery services to national and international aggregations such as Google Scholar, HathiTrust, Digital Public Library of America (DPLA), Internet Archive, Trove and WorldCat. (The graphic above shows some of the targets identified by OCLC Research Library Partners.) Such aggregations can also help inform an institution’s own collection development, as librarians can see their contributions in the context of others’ content and identify gaps that they may wish to fill locally.

Workflows for sharing metadata are often highly manual and can involve significant reworking of data through retyping, cumbersome spreadsheets and processes that impede rapid and effective access to digital content. The resources required for these current practices impede timely access as well as innovation and development of scholarly research.

Given the variety of sources for digital collections’ metadata, even within the same institution, we should not be surprised that a number of different metadata schemas are used, including Dublin Core, Encoded Archival Description (EAD), Resource Description Framework (RDF), Metadata Authority Description Schema (MADS), Metadata Object Description Schema (MODS), Metadata Encoding and Transmission Standard (METS), Text Encoding Initiative (TEI) as well as locally customized schema. Libraries often rely on crosswalks to massage the metadata from their databases into a schema acceptable to the aggregator, which necessitates losing information from the source data that cannot be mapped. Increased exposure of one’s digital collections in a national or international aggregation is important enough to invest in this effort, and the metadata will usually include a pointer to the original source containing the more detailed information.

Focus group members thought it unlikely that one “best practices” could cover the entire range of potential aggregations, as each can differ in terms of audience, scope, context, purpose and functionality. However, developing best practices for a given target is more feasible.  For example, OCLC Research Library Partners Pennsylvania State and Temple Universities have collaborated with other institutions in Pennsylvania to develop Pennsylvania’s DPLA Metadata Guidelines, “Requirements, Recommendations and Best Practices for Preparing Metadata”, which also referred to guidelines prepared by other DPLA hubs, service providers and other information professionals.

Some of the key challenges in sharing and re-using metadata describing digital collections:

  • Aggregators often have different guidelines and input formats. There is a conflict between aggregators’ very reasonable contention that they cannot support many variations in submitted metadata vs. contributors’ very reasonable contention that they cannot support the different particular needs of a wide range of aggregators. Similarly, aggregators are beginning to strongly encourage or require specific metadata elements and values to support functions they want to offer, whereas contributors do not have the programs or resources to supply such information for existing digital resources.
  • Disseminating corrections or updates between the source and the aggregation can be problematic. Information that may have been corrected in the chain leading to incorporation in the aggregation may not be pushed back to the source, so that the same errors must be corrected repeatedly. It is often not clear what data elements have been updated, when or by whom.
  • Rights to exposed digital collections may not be easily shareable. Not all metadata is “descriptive” but also includes administrative, technical or preservation information including the terms for sharing.  This information may be difficult to share. Some data must be embargoed for a period of time. OCLC Research held a seminar in 2010, “Undue Diligence: Seeking Low-risk Strategies for Making Collections of Unpublished Materials More Accessible” resulting in the document, Well-intentioned practice for putting digitized collections of unpublished materials online, endorsed by the Society of American Archivists the following year.

Although most digital collections are not yet exposed as linked data, a number of the OCLC Research Library Partners expect to publish some digital collections as linked data within the next year or two. The potential of using persistent identifiers to link to the most current version of a digital object, entity or term is very promising, and would mitigate problems associated with correcting data among different databases.  It also raised a number of questions, including:

  • How could we bundle together “statements” associated with a specific collection to provide the needed context? Would it suffice to include attributes that a digital object is a “part of” a collection?
  • How would aggregators determine which statements were appropriate to ingest or fetch for its given audience or purpose?
  • How could ontologies be aligned, especially when the same objects could be described in statements using different models?
  • How could consumers determine the provenance of a given statement for its trustworthiness and authority?
Karen Smith-Yoshimura

Karen Smith-Yoshimura, senior program officer,  topics related to creating and managing metadata with a focus on large research libraries and multilingual requirements. Karen retired from OCLC November 2020.

OCLC Research

Hanging Together is the blog of OCLC Research. Learn more about OCLC Research on our website.

Stay Connected

Sign up to have Hanging Together updates sent directly to your inbox and to keep up with the latest news about OCLC Research.

Links

  • Next – OCLC Blog
  • OCLC Research
  • OCLC Research Library Partnership
  • WebJunction

Categories

  • Archives and Special Collections (228)
  • Artificial Intelligence (AI) (18)
  • Born-Digital Special Collections (15)
  • Collaboration (30)
  • Collections (3)
  • Collective Collections (124)
  • Data Science (16)
  • Digital Preservation (70)
  • Digitization (25)
  • Equity, Diversity, Inclusion (EDI) (99)
  • Evolving Scholarly Record (12)
  • Higher Education Future (9)
  • Identifiers (44)
  • Infrastructure and Standards Support (109)
  • Libraries (103)
  • Libraries Archives and Museums (136)
  • Libraries in the Enterprise (3)
  • Library Futures (11)
  • Library Management (14)
  • Linked Data (60)
  • Measurement and Behaviors (44)
  • Metadata (125)
  • Metadata Managers (7)
  • Miscellaneous (180)
  • Modeling new services (113)
  • MOOCs (7)
  • Museums (58)
  • New Model Library (2)
  • Open Access (21)
  • Renovating Descriptive Practice (131)
  • Research Data Management (31)
  • Research Information Management (52)
  • Research Library Partnership (227)
  • Research support (69)
  • Resource Sharing (11)
  • Searching (38)
  • SHARES (11)
  • Social Interoperability (35)
  • Supporting Scholarship (69)
  • Systemwide Organization (42)
  • User Behavior Studies and Synthesis (18)
  • Visual Resources (17)
  • Web Archiving (14)
  • WebJunction (8)
  • Wikimedia (43)

Share Buttons

Share on facebook
Facebook
Share on pinterest
Pinterest
Share on twitter
Twitter
Share on linkedin
Linkedin

Recent Comments

  • Trenton James on Navigating the future of special collections metadata by using insights from the past 
  • Jackie Dooley on Research rewind: reflections on hits from our back catalog
  • Merrilee Proffitt on Futureproofing library teams
  • Eleanor Johnston on Futureproofing library teams
  • Jackie Dooley on Advocacy and resourcing in special collections: Priorities, challenges, and advice from an OCLC RLP leadership roundtable

Categories

Archives

More about OCLC Research

Visit our web site.

Recent Posts

  • Artificial intelligence to support metadata workflows: an OCLC RLP working group
  • Timeless lessons on collaboration from OCLC Research  
  • Reimagine Descriptive Workflows in the UK and Ireland: An OCLC RLP community-informed discussion
  • Open research as a strategic priority: Insights from an OCLC RLP leadership roundtable
  • Linked data for metadata operations: An RLP Product Insights session summary

Policy Links

  • Code of Conduct
  • Terms of Use
  • Privacy Statement

Admin.

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org
© 2024 OCLC || ISSN 2771-4802