The Straight Dope on OAIster

As many of you are probably aware, OCLC and the University of Michigan announced last January that OCLC was taking over the OAIster aggregation of metadata harvested from OAI-compliant repositories. The University of Michigan was no longer able to support it, and was looking for assistance in sustaining this valuable community resource. As Kat Hagedorn remarked in regards to our agreement, “Hosting anything of this size quickly got out of hand for UM Libraries, and it took us a long time to realize it. Besides, greater access for more folks? Sounds win-win to me, as long as it’s continuously freely available.” [reported by Dorothea Salo]

I have heard lots of questions since we started contacting contributors with the most recent phase of the transfer plan, so the purpose of this post is to bring everyone up to date on why we are doing this, where things are, and what we hope to accomplish in the future.

  • OCLC wanted to do whatever we could to ensure sustainability of this aggregation when the University of Michigan realized they needed assistance. We believe, as do many others, that OAIster provides a useful aggregation of millions of records representing millions of open access papers, journal articles, and other items with useful academic content. As a global non-profit library cooperative, we felt like we were the logical organization to provide support and maintenance of this service on behalf of the community at large. OCLC is committed to building on the success of OAIster by identifying open archive collections of interest to libraries and researchers, and ensuring that open archive collections will be freely discoverable and accessible to information seekers worldwide.
  • We continue to collaborate with the University of Michigan during this transition period. The University of Michigan has been tremendously generous in their time and expertise as we take over this complicated and difficult process.
  • Starting in October, the records will be freely discoverable along with all the other content in WorldCat.org. However, it will not be possible to limit a search to OAIster records alone.
  • In FirstSearch, OAIster records can either be searched along with other FirstSearch databases, or selected to search alone. OAIster records have been searchable in FirstSearch since January 2009.
  • Contributors of OAIster records can receive free access to the OAIster aggregation in FirstSearch by request. Contributors were recently contacted to offer them such access and many have already responded that they would like to have such access.
  • Only data providers that request that we not harvest their records will be removed from the aggregation. We feel strongly that one of the main benefits of OAIster has been the aggregation of records from the vast majority of repositories worldwide. Therefore, unless a repository denies us permission to harvest their records, we will seek to include them.
  • No money was exchanged in this transfer and OCLC is not making any money on the OAIster aggregation. OAIster records were added to FirstSearch at no extra charge to FirstSearch subscribers, and of course there is no charge for searching WorldCat.org, where they are also exposed. Rather than boosting revenue, in fact, OCLC is committed to making an investment in the kind of large-scale harvesting operation that OAIster represents.
  • Harvesting is hard. As anyone who has done this work will tell you, harvesting records using the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is far from simple. There are all kinds of difficulties, not the least of which is the uneven support of the protocol by the wide variety of repository platforms. Community awareness of these problems led to the formation of an NSDL and DLF-sponsored working group that produced a web site devoted to “Best Practices for OAI Data Provider Implementations and Shareable Metadata”. Since this is a difficult process, we may not get everything right from the beginning, but with help from the University of Michigan during this transition we’re hopeful that we can not only reach, but eventually exceed, what has gone before.
  • We are exploring options for machine access. Z39.50 access to OAIster is available to FirstSearch subscribers now, and we are considering whether additional options should be supported. The University of Michigan did not offer an OAI-PMH or Web Services interface, although they did offer an rsync option. Learning the needs of the community will help inform what we do in this area.
  • We are seeking to provide long-term scalability for this service and we ask for the cooperation of data providers. Something that is likely not widely known is that the University of Michigan would perform specialized processing of the retrieved records because of standards noncompliance by some data providers. In order to sustain this service over the long haul, we will need to work with data providers to reduce the number of exceptions to standard procedures.
  • We are forming an advisory board to provide us with essential advice. We know that this is an ongoing service that will require further development and support, and so we seek the advice of those knowledgeable and experienced within the community to make sure we get it as right as we can on behalf of our member institutions and the broader community of users.

We believe that we are uniquely positioned to maintain a production aggregation service of this scale in the service of information seekers worldwide. We welcome the advice and assistance of the OAI community in making this service as useful as possible for those seeking access to valuable academic content.

Tweet about this on TwitterShare on TumblrShare on LinkedInShare on FacebookBuffer this pageShare on Google+Email this to someone

About Roy Tennant

Roy Tennant works on projects related to improving the technological infrastructure of libraries, museums, and archives.

12 Comments

  1. Pingback: OCLC stampft OAIster ein. Was tun? » Infobib

  2. The Terms & Conditions (https://www3.oclc.org/app/oaister/oaister_terms.htm) are very problematic, to say the least. OCLC seems to be asking for very broad license to reuse (including by selling) not just metadata, but *any* kind of data made available to them- images, audio, video, the works. I doubt many repositories are in a position to grant these terms – much of our content (including metadata) that is not PD or institutionally copyrighted is made available to us by permission, and we cannot grant permission for the types of downstream uses, particularly commercial ones, sought by this agreement.

  3. One of the benefits of OAIster has been an open-access HTTP search interface. The proposal by OCLC to provide 1) free access as part of WorldCat, but with no ability to limit to OAIster and 2) free access for contributors to search via FirstSearch, does not meet that open-access need. Please continue to provide an open-access HTTP search interface for the OAIster database alone.

  4. Two comments:

    There is a fair bit of my work gathered through OAIster, metadata (including abstracts) for my work and that of other authors. OCLC does NOT have my permission to distribute this via a subscription database (FirstSearch, WorldCat).

    Many of the publishers who allow self-archiving do so on the basis that the work will be distributed only via a not-for-profit open access archive.

    My sympathies to the University of Michigan on the need to find funding to support open access; my recommendation is to join the Compact on Open Access Publishing Equity, encourage every other library to join, and work to expand the scope.

  5. Peter: We’re working on this, as you’re aware from the discussion on the oai-implementers listserv.

    Andrew: It’s useful to hear this kind of need, and I’m glad you brought it up. I’d be interested to hear specific points on why a separate database is useful.

    Heather:
    1. WorldCat.org is a freely available database. What about it makes it not an openly accessible archive?
    2. I’m not certain what you mean by “need to find funding.” I don’t care what kind of tool or service you’re discussing, and whether it’s for open access or not, there is always a cost. UM ponied up the cost for the development and maintenance of OAIster, and now OCLC will be doing so.
    3. Let’s not forget that OAIster contains more than just open-access materials. It’s important to differentiate between “open access” and “openly accessible”.

  6. Pingback: hangingtogether.org » Blog Archive » Clarification on OCLC/OAIster Transfer

  7. Thanks for the reply Kat.

    A separate database is useful because of the types of resources it aids discovery of, how we market it (“search scholarly digital resources”) and these not being swamped by other less relevant WorldCat content.

    Of course, the fashionable answer to that is “give the searcher as much as possible and let them decide what is useful” (what I call swamping) and “Andrew, that’s old silo-type thinking” etc. etc.
    But I’m not alone, viz. Google / Google Scholar

  8. I’d like to comment on Andrew’s second point (“free access for contributors to search via FirstSearch, does not meet that open-access need”) and the mention that OCLC is “exploring options for machine access”.

    OAIster *did* at least have a SRU interface for machine search and retrieval via BibClass-formatted XML — with the move to WorldCat, I assume this will go away, or at a minimum be replaced by OCLC-subscriber-only machine access (eg. FirstSearch or the WorldCat Search API)?

    This would represent a *major* step away from open access to the metadata, particularly for institutions and projects that aren’t OCLC members (and ironically, in many cases the same original data providers that OAIster is harvesting) who will now lose the ability to do machine search/retrieval.

    Obviously this relates closely to the OCLC WorldCat Record Use Policy and the activities of the review board. But, metadata sharing is a two-way street: we need a way to get XML records back out. (PS. The WorldCat Basic API is a good step, but *please*, formatted citations? Just give us XML, thanks.)

  9. Pingback: Library Intelligencer » Textbook Revolution

  10. Posted on behalf of Joanna Meakins:

    We at the National Library of Australia have been using OAIster’s rsync option to provide a free public interface to the full set of OAIster records in our prototype http://sbdsproto.nla.gov.au (along with other information). Although we are no longer able to update our copy of OAIster records, we are looking forward to collaborating with OCLC on an option for machine access – preferably the same rsync option as supported by University of Michigan.

  11. Roy wrote:

    “The University of Michigan did not offer an OAI-PMH or Web Services interface, although they did offer an rsync option. Learning the needs of the community will help inform what we do in this area.”

    Index Data makes the OAIster data freely available to anyone for any purpose via a Z39.50 server at masterkey.indexdata.com, and we will continue to do so if OCLC provides us free and unencumbered access to the OAIster metadata, as the University of Michigan had done until recently. We will glady ingest the data using whatever protocol is most convenient for OCLC: FTP, OAI-PMH, rsync, etc.

  12. I am very happy with this integration. What it means is that the bulk of my institution’s digital collections are now accessible through our WorldCat.org-based Local system.

    I am interested in how OAIster resources are updated (from OAI-PMH data providers) in WorldCat.org. I have stumbled across a few bad resource links.

    Re. OAIster, above: “We are exploring options for machine access.” This is great news! In practical terms, some type of web services support is needed.

Comments are closed.