Here at OCLC Research we look at a lot of bibliographic data. Usually in the aggregate, after having processed the 350+ million WorldCat records using our Hadoop cluster. One such instance is my “MARC Usage in WorldCat” project, where for the last 4 years I’ve reported on the occurrences of MARC elements, and in some cases, even the contents of particular subfields.
So as I was turning the crank on the 2016 data, I happened to notice an odd anomaly. MARC field 265 was made obsolete for some formats in 1983, and for the remainder in 1993. In 2013 there were 354,628 occurrences out of about 289 million records. In 2014 the number fell to a gratifying 158,465 out of 311 million records, and then disappeared entirely in the 2015 reporting. This was all due to work by our WorldCat Quality Control team, as they completed the conversion that they announced they would do in August 2014.
Now here is where it gets good. It’s back. We found 68 occurrences in 2016, with 3,616 holdings attached to these records. As soon as I alerted our Quality Control team, they swung into action, determined who the offending party was, notified them to please stop, and worked to clean up the existing instances. They are also working to put checks into place so these are caught on ingest.
Zombie metadata isn’t nearly as frightening as “The Walking Dead“, although for librarians it’s close. But it’s nice to know that when we find errors like this we have staff and procedures in place to take care of them.
Photo by Mike Mozart, CC-BY 2.0
Roy Tennant works on projects related to improving the technological infrastructure of libraries, museums, and archives.