Skip to content

Hanging Together

the OCLC Research blog

  • Home
  • About
  • Subscribe to Hanging Together
Main Menu
Archives and Special Collections / Metadata / Web Archiving

Slam bam WAM: Wrangling best practices for web archiving metadata

August 23, 2016September 4, 2020 - by Jackie Dooley

WAM home page screenshotThe OCLC Research Library Partnership Web Archiving Metadata Working Group (WAM, of course) was launched last January and has been working hard–really hard–ever since. Twenty-five members from Partner libraries and archives have dug in to address the challenge of devising best practices for describing websites–which are, it turns out, very odd critters compared to other types of material for which descriptive standards and guidelines already exist. In addition, user needs and behaviors are quite different from those we’re familiar with.

Our plan at the outset: do an extensive literature review on both user needs and existing metadata practices in the web context, study relevant descriptive standards and institution-specific web archiving metadata guidelines, engage the community along the way to confirm the need for this work and obtain feedback, and, ultimately, issue two reports: the first on user needs and behaviors specific to archived web content, the second outlining best practices for metadata. The heart of the latter will be a set of recommended data elements accompanied by definitions and the types of content that each should contain.

At this juncture we’ve drawn several general conclusions:

  • Descriptive standards don’t address the unique characteristics of websites.
  • Local metadata guidelines have little in common with each other.
  • It’ll therefore be challenging to sort it all out and arrive at recommended best practices that will serve the needs of users of archived websites.

We’ve reviewed nine sets of institution-specific guidelines. The table below shows the most common data elements, some of which are defined very differently from one institution to another. Only three appear in all nine guidelines: creator/contributor, title, and description.

Collection name/title Language
Creator/contributor Publisher
Date of capture Rights/access conditions
Date of content Subject
Description Title
Genre URL

Our basic questions: Which types of content are the most important to include in metadata records describing websites? And which generic data elements should be designated for each of these concepts?

Here are some of the specific issues we’ve come across:

  • Website creator/owner: Is this the publisher? Creator? Subject? All three?
  • Publisher: Does a website have a publisher? If so, is it the harvesting institution or the creator/owner of the live site?
  • Title: Should it be transcribed verbatim from the head of the home page? Or edited to clarify the nature/scope of the site? Should acronyms be spelled out? Should the title begin with, e.g., “Website of the …”
  • Dates: Beginning/end of the site’s existence? Date of capture by a repository? Content? Copyright?
  • Extent: How should this be expressed? “1 online resource”? “6.25 Gb”? “approximately 300 websites”?
  • Host institution: Is the institution that harvests and hosts the site the repository? Creator? Publisher? Selector?
  • Provenance: In the web context, does provenance refer to the site owner? The repository that harvests and hosts the site? Ways in which the site has evolved?
  • Appraisal: Does this mean the reason why the site warrants being archived? The collection of a set of sites as named by the harvesting institution? The scope of the parts of the site that were harvested?
  • Format: Is it important to be clear that the resource is a website? If so, how best to do this?
  • URL: Which URLs should be linked to? Seed? Access? Landing page?
  • MARC21 record type: When coded in the MARC 21 format, should a website be considered a continuing resource? Integrating resource? Electronic resource? Textual publication? Mixed material? Manuscript?

We’re getting fairly close to completing our literature review and guidelines analysis, at which point we’ll turn to determining the scope and substance of the best practices report. In addition to defining a set of data elements, it’ll be important to set the problem in context and explain how our analysis has led to the conclusions we draw.

So stay tuned! We’ll be sending out a draft for community review and are hoping to publish both reports within the next six months. In the meantime, please send your own local guidelines, as well as pointers to a few sample records, to me at dooleyj@oclc.org. Help us make sure we get it right!

Jackie Dooley

Jackie Dooley retired in from OCLC in 2018. She led OCLC Research projects to inform and improve archives and special collections practice.

OCLC Research

Hanging Together is the blog of OCLC Research. Learn more about OCLC Research on our website.

Stay Connected

Sign up to have Hanging Together updates sent directly to your inbox and to keep up with the latest news about OCLC Research.

Links

  • Next – OCLC Blog
  • OCLC Research
  • OCLC Research Library Partnership
  • WebJunction

Categories

  • Archives and Special Collections (229)
  • Artificial Intelligence (AI) (21)
  • Born-Digital Special Collections (15)
  • Collaboration (30)
  • Collections (3)
  • Collective Collections (124)
  • Data Science (16)
  • Digital Preservation (70)
  • Digitization (25)
  • Equity, Diversity, Inclusion (EDI) (99)
  • Evolving Scholarly Record (12)
  • Higher Education Future (9)
  • Identifiers (44)
  • Infrastructure and Standards Support (109)
  • Libraries (103)
  • Libraries Archives and Museums (136)
  • Libraries in the Enterprise (3)
  • Library Futures (11)
  • Library Management (15)
  • Linked Data (60)
  • Measurement and Behaviors (44)
  • Metadata (128)
  • Miscellaneous (176)
  • Modeling new services (113)
  • MOOCs (7)
  • Museums (58)
  • New Model Library (2)
  • Open Access (21)
  • Renovating Descriptive Practice (131)
  • Research Data Management (31)
  • Research Information Management (52)
  • Research Library Partnership (232)
  • Research support (70)
  • Resource Sharing (13)
  • Searching (38)
  • SHARES (13)
  • Social Interoperability (35)
  • Supporting Scholarship (69)
  • Systemwide Organization (42)
  • User Behavior Studies and Synthesis (18)
  • Visual Resources (17)
  • Web Archiving (14)
  • WebJunction (8)
  • Wikimedia (43)

Share Buttons

  • Bluesky
  • Facebook
  • Linkedin
  • Twitter
  • Outlook
  • Gmail
  • Yahoo Mail
  • Email

Recent Comments

  • Isabel Quintana on World of cats meets real cat: My thoughts on the ultimate library quilt
  • Kem Lang on World of cats meets real cat: My thoughts on the ultimate library quilt
  • Kelly Sattler on World of cats meets real cat: My thoughts on the ultimate library quilt
  • Renee Mercer on World of cats meets real cat: My thoughts on the ultimate library quilt
  • Trenton James on Navigating the future of special collections metadata by using insights from the past 

Categories

Archives

More about OCLC Research

Visit our web site.

Recent Posts

  • Rising to the challenge: How the SHARES resource sharing community navigated a global disruption to international shipping
  • Roles for resource sharing practitioners in making library materials accessible
  • Efficiënt ontdubbelen in WorldCat: hoe AI en catalogiseerwerk elkaar versterken
  • Deduplicación eficiente de WorldCat: Equilibrando la IA y la catalogación profesional
  • Leading through uncertainty: Fostering morale and connection in challenging times 

Policy Links

  • Code of Conduct
  • Terms of Use
  • Privacy Statement

Admin.

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org

Cookies used on Hanging Together
© 2024 OCLC || ISSN 2771-4802