Skip to content

Hanging Together

the OCLC Research blog

  • Home
  • About
Main Menu
Archives and Special Collections / Metadata

Best Practices for Web Archiving Metadata: Watch This Space!

April 5, 2017 - by Jackie Dooley

Some of you may recall that back in 2015 we surveyed our OCLC Research Library Partners to determine their top challenges with web archiving, and the need for guidance on metadata practices emerged as #1. In response, early in 2016 we established a Web Archiving Metadata Working Group (WAM) to develop best practices for metadata. The group did extensive background research over the past year, and we’re now on a fast track to publish three reports in the next several months. In the meantime, you can read a substantial overview of the project in this article published last Friday in the online Journal of Western Archives.

The first two reports will underpin the best practices: one on tools available for capture of websites, with a focus on their metadata extraction capabilities; and a review of the literature on metadata needs of web archives users.

The best practice guidelines will be in the third report. In addition to defining and interpreting a set of data elements, the report will articulate differences between bibliographic and archival standards; contrast approaches to description of individual websites and collections; and include both a literature review focused on metadata issues and crosswalks to related standards.

WAM established several principles to underpin the best practices. They are intended to …

  • … address the needs of users of archived websites as determined by our literature review
  • … be community-neutral, standards-neutral, and output-neutral; in other words, applicable to any context in which metadata for archived websites is needed
  • … consist of a relatively lean set of data elements, with the scope of each defined (i.e., a data dictionary)
  • … interpret each element for description of archived websites, which, unlike books or serials or published audiovisual media, have no conventions for representing elements such as creators, dates, or extent
  • … be upward-compatible with standards that have far deeper data element sets, including RDA, MARC, DACS, EAD, and MODS

We are in the process of finalizing the set of data elements and have adopted the following so far:

  • title
  • creator
  • contributor
  • date
  • description
  • extent
  • identifier
  • language
  • subject
  • genre

These may seem both obvious and straightforward, but most need definition and interpretation for the web context. One example: what types of date are both feasible to determine and important to include, and how can their meaning be made clear? Additional elements under consideration include geographic coverage, publisher, rights, access, source of description, URL, and collector (or should the latter be owner? or repository? or location?). We’ve eliminated from consideration several that don’t have specific applicability to websites, including audience and statement of responsibility.

We’ll be circulating the draft best practices widely across the library and archives community and are hoping to hear from many who are struggling to describe websites and collections. Our aim is to promulgate best practices that will encourage use of metadata that is both meaningful and useful to users of these resources.

Stay tuned!

 

Jackie Dooley

Jackie Dooley retired in from OCLC in 2018. She led OCLC Research projects to inform and improve archives and special collections practice.

http://www.oclc.org/research/people/dooley.html
Share on Facebook
Facebook
Tweet about this on Twitter
Twitter
Email this to someone
email

Related Posts

Accessioning Definition, Rachel Searcy, 2018.

(Re-) Articulating Accessioning

November 27, 2018November 27, 2018

Terms of use for finding aids: the time to open up has come!

July 25, 2018

Recommended Data Elements for Describing Archived Web Content

March 29, 2018March 28, 2018

OCLC Research

Hanging Together is the blog of OCLC Research. Learn more about OCLC Research on our website.

Links

  • Lorcan Dempsey's Weblog
  • Next – OCLC Blog
  • OCLC Research
  • OCLC Research Library Partnership
  • WebJunction

Categories

  • Architecture and Standards (65)
  • Archives and Special Collections (176)
  • Digital Preservation (69)
  • Digitization (22)
  • ebooks (4)
  • Equity, Diversity, Inclusion (EDI) (2)
  • Europe (57)
  • Higher Education Future (8)
  • Identifiers (22)
  • Information Literacy (1)
  • Infrastructure (33)
  • Libraries (159)
  • Libraries, Archives, Museums (LAM) (124)
  • Library Management (20)
  • Linked Data (22)
  • Managing the Collective Collection (108)
  • Measurement and Behaviors (42)
  • Metadata (56)
  • Miscellaneous (313)
  • Modeling new services (115)
  • MOOCs (7)
  • Museums (57)
  • OCLC Research Library Partnership (120)
  • Open (2)
  • Rare Books (7)
  • Renovating Descriptive Practice (115)
  • Research Data Management (13)
  • Research Information Management (32)
  • Research Note (68)
  • Resource Sharing (2)
  • Rightscaling (1)
  • Risk management (6)
  • Scholarly record (1)
  • Searching (40)
  • Software (9)
  • Supporting Scholarship (69)
  • Systemwide Organization (42)
  • Uncategorized (1)
  • User behavior (3)
  • Visual Resources (17)
  • Visualization (2)
  • Wikidata (11)
  • Wikipedia (28)

Share Buttons

Share on Facebook
Facebook
Tweet about this on Twitter
Twitter
Email this to someone
email

Recent Comments

  • دانلود آهنگ on Conference report: A Grand Re-Opening of the Public Domain
  • جلب الحبيب on RLP Research Data Management Interest Group: Acquiring RDM Services for Your Institution
  • جلب الحبيب on Conference report: A Grand Re-Opening of the Public Domain
  • René Voorburg on OCLC Research Mini-Symposium on Linked Data
  • Karen Smith-Yoshimura on Representing works and their translations in Wikibase

Recent Posts

  • RLP Research Data Management Interest Group: Acquiring RDM Services for Your Institution
  • Conference report: A Grand Re-Opening of the Public Domain
  • Welcome Mercy Procaccini and Thomas Padilla
  • OCLC RLP Assessment Interest Group — lessons learned and ingredients for success
  • Wikicite 2018: conference report

Admin.

  • Log in
  • Entries RSS
  • Comments RSS
  • WordPress.org
  • [Un]Subscribe to Posts
© 2018 OCLC