Descriptive Metadata for Web Archiving: Read the reports!

What’s the most widely-shared, top- priority web archiving issue across the OCLC Research Library Partnership? We conducted a survey two years ago to explore web archiving needs across the Partnership, and the lack of appropriate descriptive metadata guidelines for archived websites came out on top.

In response, we established the OCLC Research Library Partnership Web Archiving Working Group and recently published three reports as the outcomes of the group’s work.

Our preliminary research confirmed that guidelines would indeed be helpful to encourage consistency of practice. Although a variety of library and archival standards were in use, none addressed the array of conundrums presented by this type of resource, such as these:

Should a website owner be identified as the author (or creator), publisher, and/or subject? What about the institution that archives a site or collection of sites?
Which types of date are the most meaningful to users? Those designating the start and/or end dates of a site’s existence? The dates that sites in a collection were captured? Dates reflected in the content? The date shown on a page? And how can the meaning of any particular date be made clear?
How best should the extent of the resource be expressed to be both meaningful to users and efficient for busy metadata creators? Is the RDA default value “1 online resource” meaningful, or would a statement that includes “website” be an improvement?
Would it be useful to blend characteristics of archival and bibliographic description in descriptions of archived sites or collections? Many institutions already do so.
Which URL(s) should be included in a descriptive record?
Do existing approaches take into account the needs of users?

These and many other questions arose as we studied the relevant standards, compiled institutional guidelines, and examined numerous extant bibliographic and archival records in multiple discovery environments, including ArchiveGrid, WorldCat, and Archive-It. In every context, we found wide variation in both the data elements chosen and the nature of their content. We concluded that web-specific recommendations for descriptive metadata would be helpful, and active outreach to various specialist communities confirmed this.

Our recommended data elements and content guidelines are described in this report, together with introductory text in which we describe the characteristics of bibliographic and archival description, address issues particular to live and archived websites, and discuss aspects of collection-level and item-level approaches.

We also realized that it would be key to keep the needs and perspectives of users at top of mind as we did our research—but what are their needs? We therefore compiled and abstracted more than sixty readings from the web archiving literature that address descriptive metadata issues (at least in part). These are summarized in a second report, which includes an introductory narrative summarizing our analysis and an abstract of each reading.

We also studied other types of metadata that pertain for digital resources, which include technical and administrative in addition to descriptive–and found that significant gray area exists among them . We heard from colleagues that they needed a basic understanding of existing web harvesting tools to understand their metadata-related functionality. To address this need, we analyzed eleven tools in a third report, with particular attention to the extent to which descriptive metadata can be extracted.

Two more posts this week will go into more depth about each of the reports. As always, we would be delighted to have your feedback on the work.

Jackie Dooley

Jackie Dooley retired in from OCLC in 2018. She led OCLC Research projects to inform and improve archives and special collections practice.