Skip to content

Hanging Together

the OCLC Research blog

  • Home
  • About
  • Subscribe to Hanging Together
Main Menu
Archives and Special Collections / Metadata / Web Archiving

Best Practices for Web Archiving Metadata: Watch This Space!

April 5, 2017September 4, 2020 - by Jackie Dooley

Some of you may recall that back in 2015 we surveyed our OCLC Research Library Partners to determine their top challenges with web archiving, and the need for guidance on metadata practices emerged as #1. In response, early in 2016 we established a Web Archiving Metadata Working Group (WAM) to develop best practices for metadata. The group did extensive background research over the past year, and we’re now on a fast track to publish three reports in the next several months. In the meantime, you can read a substantial overview of the project in this article published last Friday in the online Journal of Western Archives.

The first two reports will underpin the best practices: one on tools available for capture of websites, with a focus on their metadata extraction capabilities; and a review of the literature on metadata needs of web archives users.

The best practice guidelines will be in the third report. In addition to defining and interpreting a set of data elements, the report will articulate differences between bibliographic and archival standards; contrast approaches to description of individual websites and collections; and include both a literature review focused on metadata issues and crosswalks to related standards.

WAM established several principles to underpin the best practices. They are intended to …

  • … address the needs of users of archived websites as determined by our literature review
  • … be community-neutral, standards-neutral, and output-neutral; in other words, applicable to any context in which metadata for archived websites is needed
  • … consist of a relatively lean set of data elements, with the scope of each defined (i.e., a data dictionary)
  • … interpret each element for description of archived websites, which, unlike books or serials or published audiovisual media, have no conventions for representing elements such as creators, dates, or extent
  • … be upward-compatible with standards that have far deeper data element sets, including RDA, MARC, DACS, EAD, and MODS

We are in the process of finalizing the set of data elements and have adopted the following so far:

  • title
  • creator
  • contributor
  • date
  • description
  • extent
  • identifier
  • language
  • subject
  • genre

These may seem both obvious and straightforward, but most need definition and interpretation for the web context. One example: what types of date are both feasible to determine and important to include, and how can their meaning be made clear? Additional elements under consideration include geographic coverage, publisher, rights, access, source of description, URL, and collector (or should the latter be owner? or repository? or location?). We’ve eliminated from consideration several that don’t have specific applicability to websites, including audience and statement of responsibility.

We’ll be circulating the draft best practices widely across the library and archives community and are hoping to hear from many who are struggling to describe websites and collections. Our aim is to promulgate best practices that will encourage use of metadata that is both meaningful and useful to users of these resources.

Stay tuned!

 

Jackie Dooley

Jackie Dooley retired in from OCLC in 2018. She led OCLC Research projects to inform and improve archives and special collections practice.

Share on Facebook
Facebook
Tweet about this on Twitter
Twitter
Email this to someone
email

OCLC Research

Hanging Together is the blog of OCLC Research. Learn more about OCLC Research on our website.

Stay Connected

Sign up to have Hanging Together updates sent directly to your inbox and to keep up with the latest news about OCLC Research.

Links

  • Next – OCLC Blog
  • OCLC Research
  • OCLC Research Library Partnership
  • WebJunction

Categories

  • Archives and Special Collections (209)
  • Born-Digital Special Collections (14)
  • Collaboration (10)
  • Collections (1)
  • Collective Collections (120)
  • Data Science (11)
  • Digital Preservation (69)
  • Digitization (24)
  • Equity, Diversity, Inclusion (EDI) (23)
  • Evolving Scholarly Record (11)
  • Higher Education Future (8)
  • Identifiers (42)
  • Infrastructure and Standards Support (108)
  • Libraries (98)
  • Libraries Archives and Museums (134)
  • Libraries in the Enterprise (3)
  • Library Futures (1)
  • Library Management (9)
  • Linked Data (56)
  • Measurement and Behaviors (44)
  • Metadata (105)
  • Miscellaneous (178)
  • Modeling new services (113)
  • MOOCs (7)
  • Museums (57)
  • Open Access (14)
  • Renovating Descriptive Practice (129)
  • Research Data Management (23)
  • Research Information Management (46)
  • Research Library Partnership (179)
  • Research support (42)
  • Resource Sharing (9)
  • Searching (38)
  • SHARES (9)
  • Social Interoperability (22)
  • Supporting Scholarship (65)
  • Systemwide Organization (42)
  • User Behavior Studies and Synthesis (9)
  • Visual Resources (17)
  • Web Archiving (14)
  • WebJunction (8)
  • Wikimedia (43)

Share Buttons

Share on Facebook
Facebook
Tweet about this on Twitter
Twitter
Email this to someone
email

Recent Comments

  • john ag vill on Advancing IDEAs: Inclusion, Diversity, Equity, Accessibility, 2022 July 26
  • King on Advancing IDEAs: Inclusion, Diversity, Equity, Accessibility, 2022 May 17
  • Merrilee Proffitt on The Social “Stuff”
  • Margaret Ellingson on SHARE-ing is caring, feline edition
  • Rebecca Bryant on Working from home with humans during COVID, part 2

Recent Comments

  • john ag vill on Advancing IDEAs: Inclusion, Diversity, Equity, Accessibility, 2022 July 26
  • King on Advancing IDEAs: Inclusion, Diversity, Equity, Accessibility, 2022 May 17
  • Merrilee Proffitt on The Social “Stuff”

Categories

Archives

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org

More about OCLC Research

Visit our web site.

Recent Posts

  • Improving cross-campus social interoperability
  • Insights from a recent RLP discussion on Bibliometrics and Research Impact (BRI) services
  • Advancing IDEAs: Inclusion, Diversity, Equity, Accessibility, 2022 July 26
  • Advancing IDEAs: Inclusion, Diversity, Equity, Accessibility, 2022 July 12
  • Author identity management in the book chain

Admin.

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org
© 2020 OCLC || ISSN 2771-4802