Encoded Archival Standards: a view from ArchiveGrid and NAFAN research

On August 3, Chela Weber and I had an opportunity to participate in the joint annual (and this year virtual) meeting of the Society of American Archivists Encoded Archival Standards (EAS) Section and the Technical Subcommittee on Encoded Archival Standards (TS-EAS).

*Stefan Vukotic, CC BY-SA 4.0 https://creativecommons.org/licenses/by-sa/4.0, via Wikimedia Commons*

The session featured a panel and a lively discussion on encoded archival standards in the context of aggregators of archival description: participants included representative from 13 archival aggregations across the United States as well as from the United Kingdom and Europe.

Chela and I had an opportunity to share information about ArchiveGrid, but also to share perspectives formed by research connected to the Building a National Finding Aid Network (NAFAN) project.

There is a recording of the session but here we’ve summarized our answers to some of the questions raised by the session moderator. We’ve edited some of the questions for clarity.

What standards and formats does ArchiveGrid accept and/or utilize?

ArchiveGrid supports EAD 2002, EAD 3, MARC, PDF, and HTML. A feature that distinguished ArchiveGrid from other aggregators on the panel is that EAD and other non-MARC formats are a very small part of what is represented in ArchiveGrid. Of the 7.2 million collection descriptions in ArchiveGrid, over 215,000, or 3% are EAD XML. Of these, only two institutions contribute EAD3, but those two institutions account for 8103 finding aids. The next largest format is HTML, with 200K documents. Additionally, there are 7K PDF documents.

How well do encoded archival standards (EAS) support ArchiveGrid service goals?

It is important to start by articulating ArchiveGrid goals. First and foremost, it is a discovery service that seeks to quickly connect researchers on the open web with archival collections of interest. We expect (and see) that most traffic comes from a web search, and that end users land on a collection description; very few people come to the ArchiveGrid homepage. All the many formats for collection description, including EAD are indexed and made available for discovery.

For those that conduct a search on ArchiveGrid, there is the ability to narrow search results by facets. Those facets are built on MARC records. The reason for this is twofold. First, so much of ArchiveGrid is composed of MARC records. Additionally, the encoding in MARC records is much more consistent than the EAD records. A 2013 study on the EAD in ArchiveGrid revealed that data in key elements to support advanced search and filtering was lacking; this was confirmed in a more recent analysis. EAD elements are, however, generally sufficient to populate data in the results display.

How labor-intensive is the process for contributing archival description to ArchiveGrid?

The MARC records are filtered from WorldCat and appear automatically if an institution is contributing to WorldCat. The selection criteria for what is filtered from WorldCat are described on the ArchiveGrid site. Finding aids in EAD, HTML or PDF are harvested directly from a contributor’s website; more information about contribution is detailed online but there is usually a brief email exchange with an OCLC Research staff member. This harvesting works until an institution moves their finding aids; this can be corrected by contacting us to have us update the harvester.

What barriers have you encountered in getting contributors to participate?

ArchiveGrid relies on collection descriptions being in WorldCat, or in a location that we can harvest them. We do not index finding aids that are in a format other than EAD, HTML, MARC or PDF. We cannot index finding aids that are dynamically generated.

A notable ArchiveGrid fact: over time, we are harvesting less EAD and more HTML due to the uptake of ArchivesSpace; at institutions that have implemented ArchivesSpace, the trend is to generate HTML versions of finding aids which are accessible online instead of EAD XML files. When institutions adopt ArchivesSpace, they need to arrange for us to harvest their HTML finding aids.

From your research, what barriers do archivists face when contributing to aggregation sites?

One of the strands of our NAFAN research has been investigating the enabling and constraining factors that archivists face when contributing to archival aggregations. We have learned quite a bit on this topic, so look for future blog posts and more formally published research. Factors that prove to be barriers include complying with aggregator data requirements, keeping records up to date, and cumbersome workflows for submitting records in the first place.

How well do encoded archival standards (EAS) support archivists?

Our NAFAN research also looked at enabling and constraining factors in creating archival description. Again, this is work that is very much in progress. Among our findings is that EAD is a barrier to creating archival description for many archives, especially those without content management systems and/or those contending with managing portions of their archival description in a range of legacy, unstructured, or semi-structured data formats.

What is the typical user’s purpose for going to archival aggregator sites? What can you say about end user needs?

Our end user research for NAFAN is still in process, but our pop-up survey in 2021 showed a broad range of types of users are finding and using aggregators. This includes retirees, genealogists, academics, and other professionals. In the pop-up survey we found that the highest participation from a specific profession was from librarians and archivists. Users come to these sites for a variety of personal, professional, and academic purposes.

It was great to be part of this conversation – those who look after and steward archival aggregations are a smart, informed and caring group of people and we encourage you to give the recording a listen. You can track our NAFAN research here on HangingTogether or sign up to receive NAFAN project updates.

Merrilee Proffitt

Merrilee Proffitt is Senior Manager for the OCLC RLP. She provides community development skills and expert support to institutions within the OCLC Research Library Partnership.