There is an awful lot of talk about academic libraries providing data curation services for their researchers. It turns out that in most cases that service amounts to training and advice, but not actual data management services. However, institutions without data repositories are likely thinking about implementing one. We thought it would be helpful to hear from those few who have implemented data repositories. [If you are one of those pioneers and did not get a chance to fill out the survey, feel free to describe your repository program as a comment to this post.]
OCLC Research conducted an unscientific survey about data repositories from 5/19/2015 to 7/16/2015. Initially the survey was sent to twelve institutions that were believed to have a data repository. They were asked to identify other institutions with data repositories. In total, 31 institutions were invited to take the survey. 22 filled out the survey and two of those indicated that they do not have a data repository. The following summarizes the twenty responses from institutions with data repositories.
TECHNICAL DETAILS. Eight of the institutions run a stand-alone data repository and twelve have a combination institutional repository and data repository. Six of the sites run DSpace, six run Hydra/Fedora systems, 4 have locally developed systems, and there are one each running Rosetta, Dataverse, SobekCM, and HUBzero.
PRESERVATION. All but one provide integrity checks. Seventeen keep offsite backup copies. Twelve provide format migration. Ten put master files in a dark archive. Two volunteered that they provide DOI generation.
SERVICES. Three institutions reported that they accept deposits from researchers not associated with their institution. One is part of a consortial arrangement and one is part of a network. Seven have their data or metadata harvested by other data repositories. When researchers deposit their data in an external repository, ten will include the datasets in their own repository and one includes just the metadata in their repository. All of them provide public access to data. Fourteen restrict or limit access when appropriate.
FUNDING. When asked about funding sources, eighteen reported that the library’s base budget covered at least some of the expenses. Seven said that was their only source of funding. Seven reported getting fees from researchers and four reported getting fees from departments. Five get institutional funding specifically for data management. Four get money from the IT budget. Only one institution reported getting direct funds from grant-funded projects and only one reported getting indirect funds from grant-funded projects. None reported getting fees from users, having an endowment, or having had grant funding to develop the repository.
While technical, preservation, and service issues can be challenging, I suspect that for some time the funding issues will be the most inhibiting to provision of this important service in support the university research mission.
[Many thanks to Amanda Rinehart, Data Management Services Librarian at The Ohio State University, for help with the creation of the survey]
Ricky Erway, Senior Program Officer at OCLC Research, worked with staff from the OCLC Research Library Partnership on projects ranging from managing born digital archives to research data curation. Ricky left OCLC in 2015.