Archive for October, 2011

The tail of the COMET (Project)

Thursday, October 27th, 2011 by Jim

1962 Mercury Comet Coupe

1962 Mercury Comet Coupe

Today the University of Cambridge released the final dataset from its COMET (Cambridge Open METadata) project. The final dataset contains more than 600,000 records derived from OCLC’s WorldCat available as both Marc21 and RDF triples under an Open Data Commons Attribution License (ODC-BY). All the previous data sets released, as well as this one, have been enriched with links to the FAST subject and VIAF name authority services provided by OCLC. This is the final step in the project and brings the total bibliographic records released to more than 3,600,000. OCLC Research was a formal partner in the project which was officially announced in February 2011.

While this JISC-supported project formally ended some time ago this final dataset release is noteworthy because of the license regime that has been applied. One of the goals set by the Cambridge University Library team was to release data derived from WorldCat in a fashion that was compliant with the rights and responsibilities of the cooperative. In that spirit they engaged OCLC in a discussion about the type of license that would be suitable and wondered whether OCLC had a recommendation. We didn’t at the start of the project but by the end we had engaged in enough other conversations and done enough investigation to recommend the Open Data Commons Attribution license with an explicit reference to the community norms embodied in the document WorldCat Rights and Responsibilities for the OCLC Cooperative (WCRR).

It was quite clear to us that many libraries would be engaging in data experiments similar to this Cambridge project and OCLC would be obliged to make a recommendation that could be viewed as a best practice by the members of the cooperative. Whatever recommendation we made needed to be consistent with the expectations of semantic web practitioners both in and out of the library community. That meant a standard license created by a neutral body operating globally that would be both widely used and generally understood. For a variety of reasons we settled on ODC-BY. It is a license that provides for attribution as set out in the WCRR document. Moreover from an intellectual property perspective it reflects the difference between the rights over a database as a whole, such as OCLC claims over WorldCat, and the rights over the contents of a database – the record data in WorldCat for example.

We were very pleased that Cambridge, particularly the project principal, Ed Chamberlain (an arcadia@cambridge Fellow) was willing to work with us to establish a low overhead implementation of the license as part of this final dataset release. OCLC Research and OhioLINK recently released datasets used in the OhioLINK collection and circulation analysis project under the same ODC-BY license. That project and the COMET project effort gave us real-world experience in the license implementation and an opportunity for the policy discussions that will result in a consistent recommendation to OCLC members wanting to honor the community norms expressed in the WCRR.

Ed and the project team, including the indefatigable Hugh Taylor, head of Collection Development and Description at the Cambridge University Library, with whom I’ve worked across many years, produced a project with very interesting results, sensible ongoing commentary and openly shared their experiences as they struggled with the specifics of the data and the vexed nature of library catalog ownership.

It’s worth reading Ed’s COMET blog, particularly the final entry summarizing what he learned and offering advice e.g. “‘Enliven’ linked RDF data”.

And for those who have not seen it yet, Hugh’s document describing problems inherent in understanding the origin of a MARC-encoded bibliographic record must be read. He made an heroic attempt to sort out the origins of Cambridge records with fascinating results. His analysis makes clear that most large library catalogs were created by collecting and combining whatever ingredients were at hand. And in this hobo stew the profile of rights under contract and license are complex and unclear. I was gratified to see that the conditions surrounding the WorldCat-derived data are quite clear relative to the range of records and vendors from whom they were sourced.

Congratulations to the COMET team. Working with them helped us to understand what kind of advice OCLC members want regarding the release of their catalog data and took us a long way towards a standard recommendation on a responsible and consistent licensing regime for cooperatively-sourced bibliographic data.

The photo is by Randy von Liski. Good stuff

The value of the OCLC Research Library Partnership

Thursday, October 20th, 2011 by Merrilee

What is the value of the OCLC Research Library Partnership? Don’t take it from me, hear it directly from library thought leaders, who describe the strengths of the OCLC Research Library Partnership and discuss how libraries benefit from participating. Fortunately we had a camera rolling and their thoughts are captured in this video.

Thanks to Ron Brashers, Rich Szary, Paul Constantine, Isabel Holloway, Mary Augusta Thomas, Paul McCarthy, Suzanne Thorin and David Farneth for sharing their views!

You can watch other video offerings on the OCLC Research YouTube Channel.

Check out the sandbox: ArchiveGrid

Wednesday, October 19th, 2011 by Merrilee

Last month, Jim blogged about the new and experimental ArchiveGrid. If you meant to take a look and forgot, I’ll remind you again — go take a look. And plan to join us for a free webinar on ArchiveGrid on November 3rd. Details and a link to register are available here.

From the blurb:

ArchiveGrid, an OCLC discovery service that provides access to detailed archival collection descriptions, is transitioning into a free service. Its interface has a new look and will make finding primary source materials held in archival collections worldwide even easier for researchers. In this short and lively webinar, you’ll hear about ArchiveGrid’s history and its new beta discovery system in development in OCLC Research, plus have the chance to talk with the ArchiveGrid team about future research and development plans. Join software engineer Bruce Washburn, research assistant Ellen Ast and others from OCLC Research for this demonstration and a discussion where you are welcome to share your thoughts.

Summarized: rough and ready text conversion methodologies

Monday, October 10th, 2011 by Merrilee

A few weeks ago I put out a request (here and on some listservs) asking how institutions are converting finding aids or other metadata from paper to electronic. This post summarizes the responses.

Several institutions are starting the process the same way: they are using a sheet feeder to scan documents, saving the resulting file as a PDF, and then using Adobe Acrobat Pro 9 to OCR the text. One respondent reported that the “OCR capability on the full Adobe version is really quite good for typewritten documents. I was pretty surprised.”

From there, many institutions are choosing to go on to mark up the resulting document in EAD, after spelling checking, “sense checking,” removing white space, etc. EAD markup is done using a variety of templates. Sometimes institutions use a combination of templates: Word or Open Office for the “wordy bits” of the finding aid (<did>, <bioghist>, <scopecontent>, etc.) and Excel templates for the container list or information contained in the <dsc> (many of these tools are described in our 2010 report, Over, Under, Around, and Through: Getting Around Barriers to EAD Implementation.)

Other institutions have had students key information from paper documents (usually short finding aids) directly into an EAD template.

Another approach, taken by the Louisiana Research Center at Tulane University, is nicely described by Eira Tansey’s poster, presented at the 2011 Society of American Archivists meeting. Here Tulane approaches their hidden description problem in stages, first making a basic MARC record available along with a PDF of the finding aid (allowing for basic discovery), and then moving fuller descriptions into Archon with the help of a vendor.

Speaking of handy tools, I was also pointed to Adrianna Del Collo’s nifty tools for preparing text to be imported into the Archivists Toolkit (see the links on this page).

I should note that none of the institutions that self reported were what I would consider to be small institutions — indeed, they are all ARLs. However, I do think that the nearly uniform use of sheet feeders and Adobe Acrobat is an encouraging development for institutions hoping to undertake text conversion.

Do you have a different technique to report? Doing something similar? Please do email or leave a comment below!

Day of Digital Archives blog

Thursday, October 6th, 2011 by Ricky

Jackie and I have been poring over the engaging posts on the Day of Digital Archives blog today, impressed with the many views into the work lives of those attempting to preserve content that was born digital or that has become digital. It’s really a wonderful compilation of the varieties of challenges these endeavors present. Do check it out — and why not start with this one, where Jackie describes the project we are embarking upon.

Social metadata for LAMs

Monday, October 3rd, 2011 by Karen

Metadata helps users locate resources that meet their specific needs. But metadata also helps us to understand the data we find and helps us to evaluate what we should spend our time on. Traditionally, staff at libraries, archives, and museums (LAMs) create metadata for the content they manage. However, social metadata—content contributed by users—is evolving as a way to both augment and recontexutalize the content and metadata created by LAMs.

The cultural heritage organizations in the OCLC Research Library Partnership are eager to expand their reach into user communities and to take advantage of users’ expertise to enrich their descriptive metadata. In 2009 and 2010, a 21-member Social Metadata Working Group from five countries reviewed 76 sites of most relevance to libraries, archives, and museums that supported such social media features as tagging, comments, reviews, images, videos, ratings, recommendations, lists, links to related articles, etc. The working group analyzed the results of a survey sent to site managers and discussed the factors that contribute to successful—and not so successful—use of social metadata. The working group considered issues related to assessment, content, policies, technology, and vocabularies. Central to the working group’s interest was how to take advantage of the array of potential user contributions that would improve and deepen their users’ experiences.

Our first of three reports, Social Metadata for Libraries, Archives, and Museums, Part 1: Site Reviews, provides an environmental scan of sites and third-party hosted social media sites relevant to libraries, archives, and museums. It summarizes the results of our review, captured in the “At a Glance: Sites that Support Social Metadata” spreadsheet, and more detailed reviews of 24 representative sites. Cyndi Shein, assistant archivist at the Getty Research Institute, wrote the section on LAMs’ use of third-party sites and blogs. The second report is an analysis of the results from a survey of site managers conducted from October to November 2009. The third report provides recommendations on social metadata features most relevant to libraries, archives, and museums and factors contributing to success and an annotated list of all the resources the working group consulted.

As with many OCLC Research publications, this report was written to help meet the needs of the OCLC Research Library Partnership. The Partnership not only inspires but also underwrites this type of work, so many thanks to the institutions who both contribute to and support our work!

We look forward to hearing your feedback!