Skip to content

Hanging Together

the OCLC Research blog

  • Home
  • About
  • Subscribe to Hanging Together
Main Menu
Archives and Special Collections / Metadata / Web Archiving

Best Practices for Web Archiving Metadata: Watch This Space!

April 5, 2017September 4, 2020 - by Jackie Dooley

Some of you may recall that back in 2015 we surveyed our OCLC Research Library Partners to determine their top challenges with web archiving, and the need for guidance on metadata practices emerged as #1. In response, early in 2016 we established a Web Archiving Metadata Working Group (WAM) to develop best practices for metadata. The group did extensive background research over the past year, and we’re now on a fast track to publish three reports in the next several months. In the meantime, you can read a substantial overview of the project in this article published last Friday in the online Journal of Western Archives.

The first two reports will underpin the best practices: one on tools available for capture of websites, with a focus on their metadata extraction capabilities; and a review of the literature on metadata needs of web archives users.

The best practice guidelines will be in the third report. In addition to defining and interpreting a set of data elements, the report will articulate differences between bibliographic and archival standards; contrast approaches to description of individual websites and collections; and include both a literature review focused on metadata issues and crosswalks to related standards.

WAM established several principles to underpin the best practices. They are intended to …

  • … address the needs of users of archived websites as determined by our literature review
  • … be community-neutral, standards-neutral, and output-neutral; in other words, applicable to any context in which metadata for archived websites is needed
  • … consist of a relatively lean set of data elements, with the scope of each defined (i.e., a data dictionary)
  • … interpret each element for description of archived websites, which, unlike books or serials or published audiovisual media, have no conventions for representing elements such as creators, dates, or extent
  • … be upward-compatible with standards that have far deeper data element sets, including RDA, MARC, DACS, EAD, and MODS

We are in the process of finalizing the set of data elements and have adopted the following so far:

  • title
  • creator
  • contributor
  • date
  • description
  • extent
  • identifier
  • language
  • subject
  • genre

These may seem both obvious and straightforward, but most need definition and interpretation for the web context. One example: what types of date are both feasible to determine and important to include, and how can their meaning be made clear? Additional elements under consideration include geographic coverage, publisher, rights, access, source of description, URL, and collector (or should the latter be owner? or repository? or location?). We’ve eliminated from consideration several that don’t have specific applicability to websites, including audience and statement of responsibility.

We’ll be circulating the draft best practices widely across the library and archives community and are hoping to hear from many who are struggling to describe websites and collections. Our aim is to promulgate best practices that will encourage use of metadata that is both meaningful and useful to users of these resources.

Stay tuned!

 

Jackie Dooley

Jackie Dooley retired in from OCLC in 2018. She led OCLC Research projects to inform and improve archives and special collections practice.

OCLC Research

Hanging Together is the blog of OCLC Research. Learn more about OCLC Research on our website.

Stay Connected

Sign up to have Hanging Together updates sent directly to your inbox and to keep up with the latest news about OCLC Research.

Links

  • Next – OCLC Blog
  • OCLC Research
  • OCLC Research Library Partnership
  • WebJunction

Categories

  • Archives and Special Collections (228)
  • Artificial Intelligence (AI) (19)
  • Born-Digital Special Collections (15)
  • Collaboration (30)
  • Collections (3)
  • Collective Collections (124)
  • Data Science (16)
  • Digital Preservation (70)
  • Digitization (25)
  • Equity, Diversity, Inclusion (EDI) (99)
  • Evolving Scholarly Record (12)
  • Higher Education Future (9)
  • Identifiers (44)
  • Infrastructure and Standards Support (109)
  • Libraries (103)
  • Libraries Archives and Museums (136)
  • Libraries in the Enterprise (3)
  • Library Futures (11)
  • Library Management (14)
  • Linked Data (60)
  • Measurement and Behaviors (44)
  • Metadata (126)
  • Metadata Managers (8)
  • Miscellaneous (181)
  • Modeling new services (113)
  • MOOCs (7)
  • Museums (58)
  • New Model Library (2)
  • Open Access (21)
  • Renovating Descriptive Practice (131)
  • Research Data Management (31)
  • Research Information Management (52)
  • Research Library Partnership (227)
  • Research support (69)
  • Resource Sharing (11)
  • Searching (38)
  • SHARES (11)
  • Social Interoperability (35)
  • Supporting Scholarship (69)
  • Systemwide Organization (42)
  • User Behavior Studies and Synthesis (18)
  • Visual Resources (17)
  • Web Archiving (14)
  • WebJunction (8)
  • Wikimedia (43)

Share Buttons

  • Bluesky
  • Facebook
  • Linkedin
  • Twitter
  • Outlook
  • Gmail
  • Yahoo Mail
  • Email

Recent Comments

  • Isabel Quintana on World of cats meets real cat: My thoughts on the ultimate library quilt
  • Kem Lang on World of cats meets real cat: My thoughts on the ultimate library quilt
  • Kelly Sattler on World of cats meets real cat: My thoughts on the ultimate library quilt
  • Renee Mercer on World of cats meets real cat: My thoughts on the ultimate library quilt
  • Trenton James on Navigating the future of special collections metadata by using insights from the past 

Categories

Archives

More about OCLC Research

Visit our web site.

Recent Posts

  • World of cats meets real cat: My thoughts on the ultimate library quilt
  • Scaling de-duplication in WorldCat: Balancing AI innovation with cataloging care
  • Artificial intelligence to support metadata workflows: an OCLC RLP working group
  • Timeless lessons on collaboration from OCLC Research
  • Reimagine Descriptive Workflows in the UK and Ireland: An OCLC RLP community-informed discussion

Policy Links

  • Code of Conduct
  • Terms of Use
  • Privacy Statement

Admin.

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org

Cookies used on Hanging Together
© 2024 OCLC || ISSN 2771-4802