This October it will be 15 years since I wrote the oft-cited “MARC Must Die” column for Library Journal. At the time I took a lot of heat for that sentiment (and speaking engagements to defend it) but over time the profession has pretty much come around to realizing that it is time to do something different.
When I joined OCLC in 2007, and discovered the kind of computing power available to me as well as the single largest aggregation of library metadata in the world, I was like a kid in a candy store. What, pray tell, is actually recorded in MARC, I wondered? And now I could find out.
So in 2013 I began the “MARC Usage in WorldCat” project to reveal exactly that. I would, for particular subfields, report exactly what is contained therein, with no filtering and no normalization. This is because I felt that if we were going to move to some new way of recording and sharing our data we needed to know what we have to work with.
And often what we have to work with is not…well, standard. The same concept is often registered using many different strings which means somewhere down the line translation and normalization must happen. And of course typographical errors complicate that as well as any other automated procedure we may seek to undertake. But that’s the state of play, and only by knowing what we have to work with, in reality, will we be properly armed to make the right decisions.
It has been interesting to see changes over time as well. Some fields are new, and their use has climbed over the years. Others have been deprecated and have declined in use. Once I even detected a subfield that had gone into decline after being made obsolete and then oddly shot back up again. My colleagues traced the problem to one particular record contributor and asked them to desist in using it.
As we move to a linked data world we will need to do a lot of computational maneuvering to create links where none previously existed. This work is complicated by a diverse, non-standard, and error-riddled past. That’s why we need to know exactly what we are dealing with, and this project aims to make it possible to move into the future from a position of knowledge and understanding. As always, if you would like to see a report on a particular subfield just let me know.
Roy Tennant works on projects related to improving the technological infrastructure of libraries, museums, and archives.
A white paper re Linked Data written by a colleague
https://drive.google.com/file/d/0B1IKJYVwLwHyX1VnblJFZ3EtS1U/view?usp=sharing
Yes, I’ve seen it, thanks. He posted it to the BIBFRAME list.
Thanks for continuing with this project, Roy. Fabulously useful data!