ISBNs in WorldCat - Hanging Together

Recently a question came up on the BIBFRAME list about ISBNs, and how many of them were in MARC records. This is just the kind of question that OCLC Research is uniquely placed to answer, so I quickly wrote some simple Perl code to run as a Hadoop streaming job to find out.

It was remarkably quick and easy to find out, although I had to edit and re-run the code when I discovered a flaw in my logic. This is, sadly, all too frequently the case. But not too much later I had my result:

Occurrences	# per Record	Percent of WC
230444194	0	77.71%
55668178	2	18.77%
4766652	1	1.61%
3708352	4	1.25%
616623	3	0.21%
411230	6	0.14%
125715	8	0.04%
65796	5	0.02%
45304	10	0.02%
30155	12	0.01%

These are all of the occurrences of a 020 $a in WorldCat as of 1 May 2013 [Added for clarification: the prior sentence describes exactly what is being counted. That is, I am not (yet) examining ISBNs for 10-digit vs. 13-digit; therefore, many of the records with 2 ISBNs may in fact simply have both versions]. A few observations:

Many items in WorldCat were published before the invention of the ISBN.
Many items in WorldCat are not ISBN-appropriate (e.g., unpublished materials).
ISBNs are therefore problematic as identifiers except for a narrow slice of materials (mainly printed books since the mid-60s).

A much better identifier for many purposes is, I assert, the OCLC number.

Roy Tennant

Roy Tennant works on projects related to improving the technological infrastructure of libraries, museums, and archives.

Facebook

Twitter

6 Comments on “ISBNs in WorldCat”

Celia White says:

May 31, 2013 at 10:06 am

There are a fair number of reused ISBNs, too, I’m afraid (same ISBN pulls up 2 or more different titles, though usually from the same publisher).
Roy Tennant says:

May 28, 2013 at 9:00 am

Paul: And your suggested alternative is…what?
paul says:

May 28, 2013 at 12:18 am

The trouble with the OCLC number is it is a bit OCLC centric
Eric Hellman says:

May 24, 2013 at 12:51 pm

So the reason there are so few singleton isbn’s is probably because the missing member of the 10/13 pair is automatically generated somewhere. The singletons are probably 979-* ISBNs or records that have not seen the autogeneration process.
Roy says:

May 24, 2013 at 11:45 am

Right, I was quite clear of the limitations of this analysis: “These are all of the occurrences of a 020 $a in WorldCat as of 1 May 2013.” In other words, I cared not whether an ISBN was 10 or 13 digit because that really wasn’t the point of the analysis for me — for me, it was how few records had an ISBN at all. But I could redo the analysis, which would as I think you are suggesting, that the records that have 2 ISBNS really have one — but both the 10-digit and 13-digit formulation.
Eric Hellman says:

May 24, 2013 at 11:39 am

This is a very odd result. I conjecture that you are looking at both 10- and 13- digit ISBNs, and that you count 13 digit isbns as different from the corresponding 10 digit isbns. Those should not be considered different. I predict if you look at 13 digit only, the distribution would be more normal.

Comments are closed.