Skip to content

Hanging Together

the OCLC Research blog

  • Home
  • About
Main Menu
Archives and Special Collections / Metadata / Web Archiving

Best Practices for Web Archiving Metadata: Watch This Space!

April 5, 2017September 4, 2020 - by Jackie Dooley

Some of you may recall that back in 2015 we surveyed our OCLC Research Library Partners to determine their top challenges with web archiving, and the need for guidance on metadata practices emerged as #1. In response, early in 2016 we established a Web Archiving Metadata Working Group (WAM) to develop best practices for metadata. The group did extensive background research over the past year, and we’re now on a fast track to publish three reports in the next several months. In the meantime, you can read a substantial overview of the project in this article published last Friday in the online Journal of Western Archives.

The first two reports will underpin the best practices: one on tools available for capture of websites, with a focus on their metadata extraction capabilities; and a review of the literature on metadata needs of web archives users.

The best practice guidelines will be in the third report. In addition to defining and interpreting a set of data elements, the report will articulate differences between bibliographic and archival standards; contrast approaches to description of individual websites and collections; and include both a literature review focused on metadata issues and crosswalks to related standards.

WAM established several principles to underpin the best practices. They are intended to …

  • … address the needs of users of archived websites as determined by our literature review
  • … be community-neutral, standards-neutral, and output-neutral; in other words, applicable to any context in which metadata for archived websites is needed
  • … consist of a relatively lean set of data elements, with the scope of each defined (i.e., a data dictionary)
  • … interpret each element for description of archived websites, which, unlike books or serials or published audiovisual media, have no conventions for representing elements such as creators, dates, or extent
  • … be upward-compatible with standards that have far deeper data element sets, including RDA, MARC, DACS, EAD, and MODS

We are in the process of finalizing the set of data elements and have adopted the following so far:

  • title
  • creator
  • contributor
  • date
  • description
  • extent
  • identifier
  • language
  • subject
  • genre

These may seem both obvious and straightforward, but most need definition and interpretation for the web context. One example: what types of date are both feasible to determine and important to include, and how can their meaning be made clear? Additional elements under consideration include geographic coverage, publisher, rights, access, source of description, URL, and collector (or should the latter be owner? or repository? or location?). We’ve eliminated from consideration several that don’t have specific applicability to websites, including audience and statement of responsibility.

We’ll be circulating the draft best practices widely across the library and archives community and are hoping to hear from many who are struggling to describe websites and collections. Our aim is to promulgate best practices that will encourage use of metadata that is both meaningful and useful to users of these resources.

Stay tuned!

 

Jackie Dooley

Jackie Dooley retired in from OCLC in 2018. She led OCLC Research projects to inform and improve archives and special collections practice.

www.oclc.org/research/people/dooley.html
Share on Facebook
Facebook
Tweet about this on Twitter
Twitter
Email this to someone
email

Related Posts

OCLC Research and the National Finding Aid Network project

November 10, 2020November 10, 2020

Photo of a compass sitting on top of a map

Advancing linked data for archives and special collections: a new publication from the OCLC RLP

July 28, 2020July 27, 2020

Photo of a floppy disk

Time Estimation for Processing Born-Digital Collections

April 28, 2020September 9, 2020

OCLC Research

Hanging Together is the blog of OCLC Research. Learn more about OCLC Research on our website.

Links

  • Lorcan Dempsey's Weblog
  • Next – OCLC Blog
  • OCLC Research
  • OCLC Research Library Partnership
  • WebJunction

Categories

  • Archives and Special Collections (195)
  • Born-Digital Special Collections (14)
  • Collective Collections (118)
  • Data Science (7)
  • Digital Preservation (69)
  • Digitization (24)
  • Equity, Diversity, Inclusion (EDI) (7)
  • Evolving Scholarly Record (10)
  • Higher Education Future (8)
  • Identifiers (26)
  • Infrastructure and Standards Support (88)
  • Libraries (93)
  • Libraries Archives and Museums (125)
  • Libraries in the Enterprise (1)
  • Library Management (5)
  • Linked Data (33)
  • Measurement and Behaviors (44)
  • Metadata (75)
  • Miscellaneous (176)
  • Modeling new services (112)
  • MOOCs (7)
  • Museums (57)
  • Open Access (14)
  • Renovating Descriptive Practice (114)
  • Research Data Management (19)
  • Research Information Management (35)
  • Research Library Partnership (161)
  • Research support (22)
  • Resource Sharing (8)
  • Searching (38)
  • SHARES (6)
  • Supporting Scholarship (65)
  • Systemwide Organization (42)
  • User Behavior Studies and Synthesis (6)
  • Visual Resources (17)
  • Web Archiving (14)
  • WebJunction (6)
  • Wikimedia (43)

Share Buttons

Share on Facebook
Facebook
Tweet about this on Twitter
Twitter
Email this to someone
email

Email Notifications


 

Recent Comments

  • Gail Thornburg on さようなら (Sayōnara)
  • Ivy Anderson on さようなら (Sayōnara)
  • Günter on さようなら (Sayōnara)
  • Shuwen Cao on さようなら (Sayōnara)
  • Andrew Padilla on Presenting metadata from different sources in discovery layers

Recent Posts

  • Frequently asked questions: resource sharing practice in the time of COVID-19, Phase I
  • Towards respectful and inclusive description
  • The way forward to a more open future … together
  • さようなら (Sayōnara)
  • OCLC-LIBER Open Science Discussion on Citizen Science

Admin.

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org
  • [Un]Subscribe to Posts
© 2020 OCLC