The inspiration for my title comes from Lorcan Dempsey, who some years ago, before I joined him at OCLC, put a name to the unease I had been feeling about the state of library metadata. In a Library Journal column I had bemoaned the fact that not only was it impossible for library users to limit a search to online items available online in full, it was impossible for us to even implement such a feature.
Lorcan responded to that column, citing the ” ‘murky bucket syndrome’ that affects any large bibliographic databaseâ€”we cannot entirely, unambiguously slice and dice the database because of historic data entry and cataloging practices thatâ€¦were not oriented toward our new needs.” I’ll say. Also, around that time my soon-to-be colleagues at OCLC Research wrote a paper about some related work they had done: “Mining for Digital Resources: Identifying and Characterizing Digital Materials in WorldCat”.
Later I did a deeper investigation into this while still at the California Digital Library, from which came an informal report called “Trouble in Online Paradise: An Analysis of MARC 856 Usage at One Institution”. Basically, I took 1,000,000 MARC records from UC Berkeley, pulled out all of the 856 fields (about 20,000 at the time), and analyzed them. Since I have that work on my prototype server, you can still play around with it if you want.