Today the University of Cambridge released the final dataset from its COMET (Cambridge Open METadata) project. The final dataset contains more than 600,000 records derived from OCLC’s WorldCat available as both Marc21 and RDF triples under an Open Data Commons Attribution License (ODC-BY). All the previous data sets released, as well as this one, have been enriched with links to the FAST subject and VIAF name authority services provided by OCLC. This is the final step in the project and brings the total bibliographic records released to more than 3,600,000. OCLC Research was a formal partner in the project which was officially announced in February 2011.

While this JISC-supported project formally ended some time ago this final dataset release is noteworthy because of the license regime that has been applied. One of the goals set by the Cambridge University Library team was to release data derived from WorldCat in a fashion that was compliant with the rights and responsibilities of the cooperative. In that spirit they engaged OCLC in a discussion about the type of license that would be suitable and wondered whether OCLC had a recommendation. We didn’t at the start of the project but by the end we had engaged in enough other conversations and done enough investigation to recommend the Open Data Commons Attribution license with an explicit reference to the community norms embodied in the document WorldCat Rights and Responsibilities for the OCLC Cooperative (WCRR).

It was quite clear to us that many libraries would be engaging in data experiments similar to this Cambridge project and OCLC would be obliged to make a recommendation that could be viewed as a best practice by the members of the cooperative. Whatever recommendation we made needed to be consistent with the expectations of semantic web practitioners both in and out of the library community. That meant a standard license created by a neutral body operating globally that would be both widely used and generally understood. For a variety of reasons we settled on ODC-BY. It is a license that provides for attribution as set out in the WCRR document. Moreover from an intellectual property perspective it reflects the difference between the rights over a database as a whole, such as OCLC claims over WorldCat, and the rights over the contents of a database – the record data in WorldCat for example.

We were very pleased that Cambridge, particularly the project principal, Ed Chamberlain (an arcadia@cambridge Fellow) was willing to work with us to establish a low overhead implementation of the license as part of this final dataset release. OCLC Research and OhioLINK recently released datasets used in the OhioLINK collection and circulation analysis project under the same ODC-BY license. That project and the COMET project effort gave us real-world experience in the license implementation and an opportunity for the policy discussions that will result in a consistent recommendation to OCLC members wanting to honor the community norms expressed in the WCRR.

Ed and the project team, including the indefatigable Hugh Taylor, head of Collection Development and Description at the Cambridge University Library, with whom I’ve worked across many years, produced a project with very interesting results, sensible ongoing commentary and openly shared their experiences as they struggled with the specifics of the data and the vexed nature of library catalog ownership.

It’s worth reading Ed’s COMET blog, particularly the final entry summarizing what he learned and offering advice e.g. “‘Enliven’ linked RDF data”.

And for those who have not seen it yet, Hugh’s document describing problems inherent in understanding the origin of a MARC-encoded bibliographic record must be read. He made an heroic attempt to sort out the origins of Cambridge records with fascinating results. His analysis makes clear that most large library catalogs were created by collecting and combining whatever ingredients were at hand. And in this hobo stew the profile of rights under contract and license are complex and unclear. I was gratified to see that the conditions surrounding the WorldCat-derived data are quite clear relative to the range of records and vendors from whom they were sourced.

Congratulations to the COMET team. Working with them helped us to understand what kind of advice OCLC members want regarding the release of their catalog data and took us a long way towards a standard recommendation on a responsible and consistent licensing regime for cooperatively-sourced bibliographic data.

