Recently a question came up on the BIBFRAME list about ISBNs, and how many of them were in MARC records. This is just the kind of question that OCLC Research is uniquely placed to answer, so I quickly wrote some simple Perl code to run as a Hadoop streaming job to find out.
It was remarkably quick and easy to find out, although I had to edit and re-run the code when I discovered a flaw in my logic. This is, sadly, all too frequently the case. But not too much later I had my result:
|Occurrences||# per Record||Percent of WC|
These are all of the occurrences of a 020 $a in WorldCat as of 1 May 2013 [Added for clarification: the prior sentence describes exactly what is being counted. That is, I am not (yet) examining ISBNs for 10-digit vs. 13-digit; therefore, many of the records with 2 ISBNs may in fact simply have both versions]. A few observations:
- Many items in WorldCat were published before the invention of the ISBN.
- Many items in WorldCat are not ISBN-appropriate (e.g., unpublished materials).
- ISBNs are therefore problematic as identifiers except for a narrow slice of materials (mainly printed books since the mid-60s).
A much better identifier for many purposes is, I assert, the OCLC number.
Roy Tennant works on projects related to improving the technological infrastructure of libraries, museums, and archives.