What is actually happening out there in terms of institutional data repositories?

There is an awful lot of talk about academic libraries providing data curation services for their researchers.  It turns out that in most cases that service amounts to training and advice, but not actual data management services.  However, institutions without data repositories are likely thinking about implementing one.  We thought it would be helpful to hear from those few who have implemented data repositories.  [If you are one of those pioneers and did not get a chance to fill out the survey, feel free to describe your repository program as a comment to this post.]

OCLC Research conducted an unscientific survey about data repositories from 5/19/2015 to 7/16/2015. Initially the survey was sent to twelve institutions that were believed to have a data repository. They were asked to identify other institutions with data repositories.   In total, 31 institutions were invited to take the survey. 22 filled out the survey and two of those indicated that they do not have a data repository. The following summarizes the twenty responses from institutions with data repositories.

TECHNICAL DETAILS. Eight of the institutions run a stand-alone data repository and twelve have a combination institutional repository and data repository. Six of the sites run DSpace, six run Hydra/Fedora systems, 4 have locally developed systems, and there are one each running Rosetta, Dataverse, SobekCM, and HUBzero.

PRESERVATION. All but one provide integrity checks. Seventeen keep offsite backup copies. Twelve provide format migration. Ten put master files in a dark archive. Two volunteered that they provide DOI generation.

SERVICES. Three institutions reported that they accept deposits from researchers not associated with their institution. One is part of a consortial arrangement and one is part of a network. Seven have their data or metadata harvested by other data repositories. When researchers deposit their data in an external repository, ten will include the datasets in their own repository and one includes just the metadata in their repository. All of them provide public access to data. Fourteen restrict or limit access when appropriate.

FUNDING. When asked about funding sources, eighteen reported that the library’s base budget covered at least some of the expenses. Seven said that was their only source of funding. Seven reported getting fees from researchers and four reported getting fees from departments. Five get institutional funding specifically for data management. Four get money from the IT budget. Only one institution reported getting direct funds from grant-funded projects and only one reported getting indirect funds from grant-funded projects. None reported getting fees from users, having an endowment, or having had  grant funding to develop the repository.

While technical, preservation, and service issues can be challenging, I suspect that for some time the funding issues will be the most inhibiting to provision of this important service in support the university research mission.

[Many thanks to Amanda Rinehart, Data Management Services Librarian at The Ohio State University, for help with the creation of the survey]

8 Comments on “What is actually happening out there in terms of institutional data repositories?”

  1. Hi Ricky,

    This is very interesting. Any chance we can find out which institutions answered the survey? Or was anonymity a condition of publication of the summary results?

    Thanks.

    1. We did say “Any publication or reports that result from this survey will only identify institutions after additional review and permission is granted from the participant.” Due to the nature of the survey (see reply above), we have no further plans for it.

  2. Does OCLC plan to publish a report on this survey? Is the raw data from the survey available/accessible?

    1. The survey was so brief, informal, and unscientific that the “findings” in the blog post are about all that can be wrung from it.

  3. I also think that one of the issues is a lack of expertise. Many librarians who would like to curate data have no training and the IT departments of libraries are not set up with people who are used to handling data as assets. I’d love to see a followup survey looking at what training librarians and IT practitioners are obtaining and from where. It’s not enough to attend a workshop; there has to be hands-on experience with many different file formats and an in depth understanding of the requirements for long term usability.
    Just my 2 cents.

    1. I was the head of IT when we were charged to begin development of our IR. I found the advice, experience, and training that our archivists provided to be the single most important introduction to requirements, best practices, etc., that I could have hoped for. They have continued to be critical to development throughout the process and I’m forever grateful to them…far better than any formal training.

Comments are closed.