OCLC Control Numbers in the wild

A few weeks ago, Jim posted about OCLC Control Numbers and their public domain status. In that posting, Jim wrote, of the OCN, “It’s an important element in linked library data that helps in the creation and maintenance of work sets and provides a mechanism to disambiguate authors and titles.” He went on to detail the numerous ways that the OCN has been “widely used within the broad system of information that flows among libraries, national information agencies, commercial information providers and organizations that supply consumers with book and journal-oriented services.”

While that’s all well and good (and true) I wanted to provide some specific details on how the OCN is being used outside of what most of us consider normal channels of book information flows, and that is how the OCN is being used in (English Language) Wikipedia and in the ambitious Wikidata project. These are based on some counts that Max and I did a while ago, but I think they are current enough to make my point, which is that the OCN is recognized as having value outside of the library and publishing domain.

Wikipedia relies pretty heavily on a number of templates — one such template is the Authority Control Template, which Max has written about before. Another template, which I know you’ve all seen before, is the Infobox Book template.

Infobox Book For Alice Munro's Too Much Happiness
Infobox Book For Alice Munro’s Too Much Happiness
This template, like most Wikipedia templates, contains what we would immediately recognize as metadata. This example, of Alice Munro’s Too Much Happiness, includes the author, date and country of publication, ISBN, and our friend the OCN. The OCN has been used in this template for a long time, and helps, as does the ISBN, to disambiguate works from one another. In Wikipedia, as in library catalogs, disambiguation is important, which is why the Wiki community values trusted identifiers like the OCN. And the OCN can really come in handy when there is no ISBN, as is the case with any book published before 1970.

Although not every Wikipedia article that is about a book has this template, but many do, so it can be a good way to see how many books have Wikipedia articles about them. A few months ago, I did a count on a dump of Wikipedia (using some jazzy scripts that Max wrote) and found that there were 29,673 instances of Infobox Book. In those templates, there were 23,304 ISBNs and 15,226 OCNs. Let’s hear it for identifiers!

Max did a count of identifiers in the newer Wikidata, and found that of around 14 million Wikidata items, 28,741 were books. 5403 Wikidata items have an ISBN-13 associated with them, and 12,262 have OCNs. Why is the number of ISBNs so low? Because Wikidata has a slot for ISBN-13 only; they are assuming that contributors will pad any ISBN-10s, but the numbers speak for themselves. Identifiers are of even greater importance in Wikidata than in Wikipedia, since Wikidata is all about metadata.

So there’s a look at how the humble OCN is being used, even outside the library and publishing context.

As a sidenote, there are several different flavors of Infobox Book, and one of them, I recently learned, is Infobox Dr Who Book. Go figure.