Skip to content

Hanging Together

the OCLC Research blog

  • Home
  • About
  • Subscribe to Hanging Together
Main Menu
Archives and Special Collections / Metadata / Web Archiving

Best Practices for Web Archiving Metadata: Watch This Space!

April 5, 2017September 4, 2020 - by Jackie Dooley

Some of you may recall that back in 2015 we surveyed our OCLC Research Library Partners to determine their top challenges with web archiving, and the need for guidance on metadata practices emerged as #1. In response, early in 2016 we established a Web Archiving Metadata Working Group (WAM) to develop best practices for metadata. The group did extensive background research over the past year, and we’re now on a fast track to publish three reports in the next several months. In the meantime, you can read a substantial overview of the project in this article published last Friday in the online Journal of Western Archives.

The first two reports will underpin the best practices: one on tools available for capture of websites, with a focus on their metadata extraction capabilities; and a review of the literature on metadata needs of web archives users.

The best practice guidelines will be in the third report. In addition to defining and interpreting a set of data elements, the report will articulate differences between bibliographic and archival standards; contrast approaches to description of individual websites and collections; and include both a literature review focused on metadata issues and crosswalks to related standards.

WAM established several principles to underpin the best practices. They are intended to …

  • … address the needs of users of archived websites as determined by our literature review
  • … be community-neutral, standards-neutral, and output-neutral; in other words, applicable to any context in which metadata for archived websites is needed
  • … consist of a relatively lean set of data elements, with the scope of each defined (i.e., a data dictionary)
  • … interpret each element for description of archived websites, which, unlike books or serials or published audiovisual media, have no conventions for representing elements such as creators, dates, or extent
  • … be upward-compatible with standards that have far deeper data element sets, including RDA, MARC, DACS, EAD, and MODS

We are in the process of finalizing the set of data elements and have adopted the following so far:

  • title
  • creator
  • contributor
  • date
  • description
  • extent
  • identifier
  • language
  • subject
  • genre

These may seem both obvious and straightforward, but most need definition and interpretation for the web context. One example: what types of date are both feasible to determine and important to include, and how can their meaning be made clear? Additional elements under consideration include geographic coverage, publisher, rights, access, source of description, URL, and collector (or should the latter be owner? or repository? or location?). We’ve eliminated from consideration several that don’t have specific applicability to websites, including audience and statement of responsibility.

We’ll be circulating the draft best practices widely across the library and archives community and are hoping to hear from many who are struggling to describe websites and collections. Our aim is to promulgate best practices that will encourage use of metadata that is both meaningful and useful to users of these resources.

Stay tuned!

 

Jackie Dooley

Jackie Dooley retired in from OCLC in 2018. She led OCLC Research projects to inform and improve archives and special collections practice.

OCLC logo

Hanging Together is the blog of OCLC Research.

Stay Connected

Sign up to have Hanging Together updates sent directly to your inbox and to keep up with the latest news about OCLC Research.

Links

  • Next – OCLC Blog
  • OCLC Research
  • OCLC Research Library Partnership
  • WebJunction

Categories

  • Archives and Special Collections (61)
  • Artificial Intelligence (AI) (27)
  • Born-Digital Special Collections (4)
  • Collaboration (32)
  • Collections (3)
  • Collective Collections (14)
  • Data Science (13)
  • Digital Preservation (6)
  • Digitization (5)
  • Equity, Diversity, Inclusion (EDI) (98)
  • Evolving Scholarly Record (2)
  • Higher Education Future (8)
  • Identifiers (27)
  • Infrastructure and Standards Support (31)
  • Libraries (14)
  • Libraries Archives and Museums (15)
  • Libraries in the Enterprise (3)
  • Library Futures (13)
  • Library Management (12)
  • Linked Data (52)
  • Measurement and Behaviors (4)
  • Metadata (109)
  • Miscellaneous (12)
  • Modeling new services (2)
  • Museums (1)
  • New Model Library (2)
  • Open Access (21)
  • Renovating Descriptive Practice (17)
  • Research Data Management (29)
  • Research Information Management (31)
  • Research Library Partnership (124)
  • Research support (66)
  • Resource Sharing (10)
  • Shared Print (1)
  • SHARES (10)
  • Social Interoperability (35)
  • Supporting Scholarship (6)
  • Systemwide Organization (2)
  • User Behavior Studies and Synthesis (16)
  • Web Archiving (7)
  • WebJunction (7)
  • Wikimedia (15)

Share Buttons

  • Bluesky
  • Facebook
  • Linkedin
  • Twitter
  • Outlook
  • Gmail
  • Yahoo Mail
  • Email

Recent Comments

  • Danladi Bala on New OCLC Research Report: The Library Beyond the Library
  • Tony Ferguson on Rising to the challenge: How the SHARES resource sharing community navigated a global disruption to international shipping
  • Isabel Quintana on Werewolves in WorldCat
  • Cynthia Hall on Werewolves in WorldCat

Categories

Archives

Recent Posts

  • No edge case: Understanding AI opportunities through Arabic metadata workflows
  • New OCLC Research Report: The Library Beyond the Library
  • Data-driven workflows and the art of informational collaboration
  • Listening to library leaders: Surveys capture real-time perspectives shaping decisions across the field
  • Scaling research support at Monash University Library
OCLC logo

Hanging Together is the blog of OCLC Research.

Policies

  • Code of Conduct
  • Terms of Use
  • Privacy Statement
  • Cookies used on Hanging Together

Admin

Log in
© 2026 OCLC || ISSN 2771-4802