Archive for the 'Architecture and standards' Category

More From the “Murky Bucket”

Thursday, March 3rd, 2011 by Roy

The inspiration for my title comes from Lorcan Dempsey, who some years ago, before I joined him at OCLC, put a name to the unease I had been feeling about the state of library metadata. In a Library Journal column I had bemoaned the fact that not only was it impossible for library users to limit a search to online items available online in full, it was impossible for us to even implement such a feature.

Lorcan responded to that column, citing the ” ‘murky bucket syndrome’ that affects any large bibliographic database—we cannot entirely, unambiguously slice and dice the database because of historic data entry and cataloging practices that…were not oriented toward our new needs.” I’ll say. Also, around that time my soon-to-be colleagues at OCLC Research wrote a paper about some related work they had done: “Mining for Digital Resources: Identifying and Characterizing Digital Materials in WorldCat”.

Later I did a deeper investigation into this while still at the California Digital Library, from which came an informal report called “Trouble in Online Paradise: An Analysis of MARC 856 Usage at One Institution”. Basically, I took 1,000,000 MARC records from UC Berkeley, pulled out all of the 856 fields (about 20,000 at the time), and analyzed them. Since I have that work on my prototype server, you can still play around with it if you want.

Read the rest of this entry »

Monster Mash

Tuesday, September 21st, 2010 by Roy

Tomorrow Bruce Washburn and I leave from the San Mateo office of OCLC Research to help run the WorldCat Mashathon in Boston (well, Cambridge, really, but you could toss a rock across the river and hit Boston). I really enjoy these events, since it is a couple days of helping library programmers learn about OCLC Web Services with a good chunk of time set aside to play with them. We’ll have all day Thursday and Friday to devote to learning and playing, which can be time difficult to come by when under pressure to deliver at your place of employment.

Previous Mashathons have yielded a number of new mashups, many of which have ended up in our Application Gallery. Previous attendees have also integrated a number of service improvements in their local systems using these APIs. Mashers are not limited to OCLC APIs by any means. We take pains to point out a list of library-related APIs that I maintain over on my TechEssence.info site. Any API is fair game. Or linked data, or what have you. Whatever developers can use to improve their local services is fine with us.

So why did I title this post “Monster Mash”? Why I’ll be there…why else?

Economics of Scholarly Production: Supplemental Materials

Wednesday, August 25th, 2010 by Constance

At the Spring CNI Taskforce meeting last April, Karen Wetzel (Standards Program Manager at NISO) announced a new piece of work related to “supplemental materials” in journal articles. In the scientific literature, it is not uncommon for articles to be accompanied by a secondary set of figures, data, documentation of experimental protocols that aren’t considered part of the core content. Karen reported that thought-leaders from a variety of sectors had expressed concerns about the expense that publishers incur in managing this material, as well as the additional work that it creates for editorial staff and authors. Libraries were included in a long list of potential stakeholders, as potential curators of this supplemental material.

A central concern is that scholarly citation and reuse of this kind of supporting material is limited by the absence of identifiers, bibliographic metadata etc. Read the rest of this entry »

Breaking Open the ILS Silos

Friday, August 20th, 2010 by Roy

In 2007-2008, the Digital Library Federation (DLF) convened a Task Group to recommend standard interfaces for integrating the data and services of the Integrated Library System (ILS) with new applications supporting user discovery. The group produced a report with recommendations in December 2008. After that not much happened.

In February 2010, at the Code4Lib Conference, Karen Coombs (the OCLC Developer Network manager) and I brought together some of the people who had been on that task group as well as other interested parties who were at the conference to take this work to the next stage. At this ad hoc meeting we agreed that we were ready to take this work to the next stage. The next stage, we felt, was to actually create a middleware layer that we could collaboratively maintain. Read the rest of this entry »

Next-Gen Harvesting

Thursday, February 4th, 2010 by Roy

Metadata harvesting (collecting metadata from others and aggregating it in a collection) is not new. Although there are any number of ways to do this, the OAI-PMH protocol for metadata harvesting is often used and has been around for years. It defines a small set of actions that allows anyone to discover what sets of metadata are available for harvesting from a digital repository, which metadata formats are offered, and select and download those records. Thousands of repositories worldwide support it, sometimes even unknowingly, because many repository applications such as DSpace and ePrints come with OAI-PMH support out of the box.

This has led to a world in which there are metadata aggregators and even agreggators of aggregators. It has also led to potential confusion and difficulty. Records that are picked up from their “native” location and indexed and displayed elsewhere may not be depicted as the creator of that metadata intended. They also may not be refreshed in a timely fashion, thereby potentially leading to records that are out-of-date persisting in various corners of the Internet.

This is why when my colleagues on the services side of the house announced the WorldCat Digital Collection Gateway I sat up and took notice. This heralds a new world in which those being harvested can exert some control over not only how frequently their records are updated, but also how those records are depicted in the aggregation — in this case, WorldCat. Through a simple web-based interface, you can provide your OAI-PMH base URL, have the Gateway test harvest some records, view how those records would display in WorldCat, and change the mapping if you wish. Another benefit is that your records will then appear in all of the places WorldCat is syndicated.

A pilot project to test the Digital Collection Gateway was just announced, beginning March 1, and we are seeking volunteers to try it out and provide feedback. During the pilot you will be asked to:

  • Attend a two-hour webinar reviewing the use of the Gateway
  • Upload a minimum of 500 metadata records to WorldCat
  • Offer feedback and input on your experience with the Gateway to our support and product teams so we can improve the tool and workflows

If you would like to help us create a next-generation harvesting infrastructure, in which you control your metadata more than ever before, email us at oaister@oclc.org.

ORCID and ISNI: Author, Swineherd, Taxman, Alcohol Researcher

Saturday, January 30th, 2010 by Jim

At recent meetings I attended in Washington D.C. there was significant hallway discussion about the Open Researcher Contributor Identification (ORCID) initiative. Given the science orientation of the meetings this initiative to resolve the problem of name ambiguity and attribution in scholarly publication was particularly welcomed. As you’ll see if you visit the ORCID site this is early days for this pre-competitive multi-publisher effort whose goal is to establish

“an open, independent registry that is adopted and embraced as the industry’s de facto standard.” Their mission is “to resolve the systemic name ambiguity, by means of assigning unique identifiers linkable to an individual’s research output, to enhance the scientific discovery process and improve the efficiency of funding and collaboration.”

Meeting one was convened by Thomson Reuters and Nature Publishing not long ago with the first meeting in November 2009. The roster of participants is impressive and the continued involvement of Elsevier made those with whom I talked hopeful that this would be as successful an effort as CrossRef has been. A recent editorial in Nature Credit where credit is due (pdf) is quite to the point about the implications of success.

My colleagues, Thom Hickey and Janifer Gatenby, have been involved. OCLC has much to contribute here given Thom’s leadership of the Virtual International Authority File (VIAF) effort and Janifer’s in the development of the International Standard Name Identifier (ISNI). The scope of ORCID is narrower than ISNI as the latter is intended for the identification of “identities used publicly by parties involved throughout the media content industries in the creation, production, management, and content distribution chains.” This goes across all fields of creative activity not just science. As Janifer said,

“ISNI could become a cross domain identifier so that a researcher who also plays in a rock band (and wants it known that he is one and the same) can be identified.”

Read the rest of this entry »

The Straight Dope on OAIster

Monday, September 21st, 2009 by Roy

As many of you are probably aware, OCLC and the University of Michigan announced last January that OCLC was taking over the OAIster aggregation of metadata harvested from OAI-compliant repositories. The University of Michigan was no longer able to support it, and was looking for assistance in sustaining this valuable community resource. As Kat Hagedorn remarked in regards to our agreement, “Hosting anything of this size quickly got out of hand for UM Libraries, and it took us a long time to realize it. Besides, greater access for more folks? Sounds win-win to me, as long as it’s continuously freely available.” [reported by Dorothea Salo]

I have heard lots of questions since we started contacting contributors with the most recent phase of the transfer plan, so the purpose of this post is to bring everyone up to date on why we are doing this, where things are, and what we hope to accomplish in the future. Read the rest of this entry »

Context for Metasearch

Friday, August 28th, 2009 by Jennifer

Last Friday the Encoded Archival Context (EAC) standard for archival authorities was released to the international community for review. Warning: an EAC record is not your grandmother’s MARC authority record. EAC is a companion standard to Encoded Archival Description (EAD), yet now seems to be useful well beyond the world of archives.

Managing collections archivally requires archivists to create comprehensive descriptions of corporate bodies, persons and families. Who would know better the context of records and creators than the archivists with the stuff in their hands? And who knew that this contextual information would be exactly what folks want to share when Networking Names [pdf]? With EAC we can link the creators, the context and the stuff. EAC goes one step further, facilitating the exchange of authoritative contextual information across many domains.

It turns out EAC is useful infrastructure for metasearch. At our RLG Annual Meeting, Warwick Cathrow demonstrated The National Library of Australia’s prototype “one-search” service. Here one can discover everything – pictures, books, archives, newspaper articles, music, etc. – by and about a creator. The Australians have used EAC to collate dispersed, silo-ed information. (Just search the Christian name “Nellie” and watch it go! Hats off to Basil Dewhurst and his team.) Read the rest of this entry »

Networking names

Friday, May 1st, 2009 by Karen

Our Networking Names report has just been published! I was pleased to see this morning a number of tweets announcing it – or echoing other tweets.

I blogged last November about names touching everything soon after the Networking Names Advisory Group met together at the Met. The fifteen members of the advisory group have spent the time since refining fourteen use case scenarios, those that they were most knowledgeable about – academic libraries and scholars, archivists and archival users, and institutional repositories. These use case scenarios envisioned how different communities could benefit from aggregating information about persons and organizations, corporate and government bodies, and families, and making it available on a network level.  From the use case scenarios we derived the functions and attributes of what would be needed for a “cooperative Identities Hub”.

Some of the components of a cooperative Identities Hub exist or are being developed. We wanted to articulate the characteristics of a gateway to all forms of names authorized or used in other contexts without preferring one form of name over another and that would use social networking to tap expertise in all communities. We envisioned a switch for users or their machine applications to extract relevant information for re-use in their own contexts and enable contributions from different sources.  These are objectives we can all strive towards.

We’re looking at ways to amplify this work. Feel free to post your comments or reactions here in the meantime.

I am deeply grateful to all the RLG Partner staff who contributed to the report – a very talented group to work with: Grace Agnew (Rutgers), Laura Akerman (Emory), Genevieve Clavel (Swiss National Library), Joan Cobb (Getty Research Institute), Michele Crump (U. Florida), Amanda Hill (U. Manchester/UK Names Project), Deborah Kempe (Frick), Amy Lucker (New York University), Dennis Meissner (Minnesota Historical Society), Suzanne Pilsk (Smithsonian), Michael Rush (Yale), Jon Shaw (U. Pennsylvania), Laura Smart (CalTech), Daniel Starr (Metropolitan Museum of Art), Bob Wolven (Columbia).

Analysis Methodology for Museum Data

Wednesday, April 29th, 2009 by GĂĽnter

In a previous post, I’ve shared some background about the data analysis phase of our Museum Data Exchange Mellon grant, and posted some of the questions our museum participants wanted to have answered. In the meantime, we have created a spreadsheet [pdf] which captures our ideas to date of what questions we may want to ask of the 850K CDWA Lite XML records from 9 museums. Note that the methodology captured by this spreadsheet lays out a landscape of possibilities – it is not a definitive checklist of all the questions we will answer as part of this project. Only as we get deeper into the analysis will we know which questions are actually tractable with the tools we have at hand. I’d appreciate any thoughts on additional lines of inquiry we could pursue with our analysis, or other observations!

Read the rest of this entry »