Next-Gen Harvesting

Metadata harvesting (collecting metadata from others and aggregating it in a collection) is not new. Although there are any number of ways to do this, the OAI-PMH protocol for metadata harvesting is often used and has been around for years. It defines a small set of actions that allows anyone to discover what sets of metadata are available for harvesting from a digital repository, which metadata formats are offered, and select and download those records. Thousands of repositories worldwide support it, sometimes even unknowingly, because many repository applications such as DSpace and ePrints come with OAI-PMH support out of the box.

This has led to a world in which there are metadata aggregators and even agreggators of aggregators. It has also led to potential confusion and difficulty. Records that are picked up from their “native” location and indexed and displayed elsewhere may not be depicted as the creator of that metadata intended. They also may not be refreshed in a timely fashion, thereby potentially leading to records that are out-of-date persisting in various corners of the Internet.

This is why when my colleagues on the services side of the house announced the WorldCat Digital Collection Gateway I sat up and took notice. This heralds a new world in which those being harvested can exert some control over not only how frequently their records are updated, but also how those records are depicted in the aggregation — in this case, WorldCat. Through a simple web-based interface, you can provide your OAI-PMH base URL, have the Gateway test harvest some records, view how those records would display in WorldCat, and change the mapping if you wish. Another benefit is that your records will then appear in all of the places WorldCat is syndicated.

A pilot project to test the Digital Collection Gateway was just announced, beginning March 1, and we are seeking volunteers to try it out and provide feedback. During the pilot you will be asked to:

  • Attend a two-hour webinar reviewing the use of the Gateway
  • Upload a minimum of 500 metadata records to WorldCat
  • Offer feedback and input on your experience with the Gateway to our support and product teams so we can improve the tool and workflows

If you would like to help us create a next-generation harvesting infrastructure, in which you control your metadata more than ever before, email us at

4 Comments on “Next-Gen Harvesting”

  1. It amazes me how libraries need control over how their precious metadata records are displayed in 3rd party systems. I heard this in recent Europeana meeting too.

    I have no say in how Google summarizes my home page. Do libraries? Do Google Book libraries control how Google displays their books? What is this about?

  2. I’ve been using OAIster for our OAI stuff, and am now on the pilot for the Worldcat web app. Looks very promising, and everything we do at BHL (biodiversity heritage library) is promote open standards.

Comments are closed.