ISBNs in WorldCat

Recently a question came up on the BIBFRAME list about ISBNs, and how many of them were in MARC records. This is just the kind of question that OCLC Research is uniquely placed to answer, so I quickly wrote some simple Perl code to run as a Hadoop streaming job to find out. It was …

More

“Cataloging Unchained”

Lorcan Dempsey (VP of Research at OCLC) has long said that we need to “make our data work harder.” And for years that is exactly what OCLC Research has been doing. So when I was asked to speak on data mining at the OCLC European, Middle East, and African Regional Council Meeting in Strasbourg, France, …

More

Top Corporate Names in WorldCat

As I explained earlier, I have been doing some investigations into how MARC has been used over the last several decades. Curious about the contents of the 110 $a (corporate names), I parsed it and the top 30 headings are listed below. Keep in mind a few things, however: Entities can be put together in …

More

Two Huge Linked Data Announcements

This week we have announced two major initiatives that are now providing significant library linked data resources to the world. First was the announcement yesterday that all of the 23rd Edition of the Dewey Decimal Classification has been released on the web as linked data. From the announcement: All assignable classes from DDC 23, the …

More

Five Easy Pieces

I seem to have acquired an obsession. This obsession manifests itself in various ways, but one clear way is that I can’t seem to stop thinking about some of the findings from my colleague’s work that resulted in the publication Implications of MARC Tag Usage on Library Metadata Practices. Chief among them, in my view, …

More

FAST on the street

I’m pleased to say that today OCLC Research released FAST (Faceted Application of Subject Terminology) as linked data under an Open Data Commons Attribution license. FAST has been a multi-year project of OCLC Research in collaboration with the Library of Congress. The FAST authority file is an enumerative, faceted subject heading schema derived from the …

More