As part of our ongoing effort to highlight work that we’re doing in our new Archives and Special Collections program, I thought I’d talk a little about our current efforts mining data out of WorldCat.
In the Define the State of Holdings and Description for Archives Project (which I often refer to as our data mining project) we are looking at archival descriptive practice as represented in WorldCat. In the first instance, this will be an analysis of 1.7 million MARC records under archival control. Right now, we are seeing how well these records match up with the recommendations for single level minimum and single level optimum in DACS. This will be a little tricky, because DACS is a data content standard, and MARC is a data format standard. We’d like to go beyond reporting on field/indicator/subfield usage — through sampling content, we may be able to say something about the characteristics of the data as well.
We are far from done (in fact, we’ve only recently started), but we should have results to share quite soon, including information about date of creation, encoding levels, geographic location of materials, and more. I will be reporting on preliminary findings and giving an overview of the project at the PACSCL conference “Something Old for Something New”, so I’ll report back on this blog sometime in the early December timeframe.
Further down the line, I would like to see us do a similar analysis on EAD records represented in ArchiveGrid, but this will be tougher (due to the nature of EAD) and perhaps less impactful (due to the scarcity of EAD records versus MARC records).
I should also note that we are open to other investigations, so if you have research questions, let’s hear them!