Data Management and Curation in 21st Century Archives - Part 1

I attended the 79^th Annual Meeting of the Society of American Archivists (SAA) last month in Cleveland, Ohio and was invited to participate on the Research Libraries Roundtable panel on Data Management and Curation in 21^st Century Archives. Dan Noonan, e-Records/Digital Resources Archivist, moderated the discussion. Wendy Hagenmaier, Digital Collections Archivist, Georgia Tech Library and Sammie Morris, Director, Archives and Special Collections & University Archivist, Purdue University Libraries joined me on the panel. Between the three of us there was a nice variety of perspectives given our different experiences and interests.

It was a great panel so I decided to discuss it in a two parts. In this part, Managing and Curating Data with Reuse in Mind, I summarize key points from my presentation. In Part 2, I will highlight key points from Wendy and Sammie’s presentations that made an impression on me.

Managing and Curating Data with Reuse in Mind

I was excited to be invited to participate in a panel discussion on Data Management and Curation in 21^st Century Archives at SAA, given my perspective is not that of an archivist. I’ve been studying data reuse in academic communities and more recently I’ve been examining libraries’ role in e-research and data on their campuses.

Given my experiences and interests my goal in participating on the panel was to convince archivists to bring their expertise to the table with an eye toward satisfying, perhaps even delighting, data reusers. I believe revolving conversations about data management and curation in 21^st century archives around the needs of data reusers serves to inform the preservation of data’s meaning as well as other archival practices, particularly the partnerships archivists form, the questions they ask, and the activities they pursue.

Preservation of data’s meaning

When we think about preserving the meaning of research data, the goal is that someone not involved in the study can come along and make sense of the data. It’s no surprise that contextual information about how data are collected is critical.

For instance, a zoologist uses field notes to sort out whether a wolf might have been a dog or coyote hybrid. A social scientist references the instructions and layout of a survey to understand differences between survey responses. An archaeologist thinks artifacts are meaningless in absence of information about where they came from and how they were acquired and excavated.

While the need for data collection information is obvious, what is often surprising to some is the level of contextual detail reusers want about it and the additional kinds of context they seek, including information about the data producer, data repository, data analysis, digitization and curation, preservation, and prior reuse.

Questions asked: It’s not just about context

Ask data reusers what contextual information they need as well as why they need it and where they go to get it. What we have heard has enlightened us about disciplinary attitudes and practices. We have learned what constitutes data quality and how it contributes to their decision making and satisfaction. Our understanding of data quality has become more nuanced and given us something tangible to work toward given its importance in data management and curation.

A zoologist deciding whether to combine data from different studies needs to know if the definitions for a concept hold across the two different data sources. We call this need to evaluate whether and how data from different studies can be integrated ease of operation. A social scientist determining whether data are relevant given research objectives looks at how variables are defined and measured. An archaeologist relies on information about data producers to judge whether their data are credible.

When asking researchers to talk about how they reused other’s data, we’ve learned that it’s not just about capturing context so researchers can understand data. Our findings show researchers judge other’s data in various ways to decide if the data are worthy of reuse. We need to know more about these judgements. If we are going to curate and preserve data to be reusable we need to have a better sense of what reusable means.

Partnerships formed: It’s bigger than the archive

Looking at data management and curation from a reuser’s perspective also might influence the partnerships archivists form. It’s bigger than the archive. Archivists cannot go it alone. Our work shows how actions in one part of data’s lifecycle influence other parts.

How data producers collect, record, and document their data impacts repository staff and data reusers. For instance, we found archaeologists collecting data in the field had systems to identify tooth wear, but there were no guidelines for documenting tooth wear. Consequently they recorded it in different ways impacting repository staff’s data processing time and reusers’ understanding. We’ve also found instances where repository staff’s actions motivated data producers to share and impacted the satisfaction of data reusers and where data reusers influenced repository policy and data producers’ future actions.

While we’ve revealed interdependencies in an attempt to improve data sharing, management, and reuse experiences, we’ve only looked at three roles. Of those roles, we’ve only considered one that sits between data producers and data reusers – repository staff (i.e. the data curator). We know there are more – archivists, librarians, technologists, compliance officers, administrative staff, etc.

When asked what facilitates research data services, two-thirds of librarians mentioned communication, coordination, and collaboration with people from other units on their campus as a means to define, develop, and deliver services, pool expertise, and outline roles and responsibilities. Our research suggests that the key will be managing these stakeholders’ interdependencies through data’s lifecycle by identifying pain points and supportive actions that will move things forward.

Activities pursued: It’s always about designated communities

Lastly incorporating data reusers’ perspectives and practices into conversations about data management and curation might influence the activities archivists pursue. It’s always about the designated community of users. We witnessed this in our interviews with staff at three data repositories – Inter-university Consortium for Political and Social Research, University of Michigan Museum of Zoology, and Open Context. Findings showed staff dealt with six types of change in data repositories, one of which was responding to their user communities.

At each repository, staff were found to adjust their processes and procedures to accommodate the developing needs of their users. The museum developed new specimen preparation, preservation, and loan procedures when DNA testing became available. ICPSR staff were deciding when and how they could meet demand for new data formats such as video. Rather than design the Open Context website to be “Flickry” and collaborative, staff decided on a more straightforward publication platform because archaeologists wanted something more professional that they could cite on their CVs.

In our roles, whether archivists, librarians, technologists, researchers, etc., we need to think about how we can talk, listen, observe, learn from, teach, and delight data reusers. We should strive to ensure our actions encompass the audience we are trying to reach.

Are any of you actively engaged with your scholarly communities to understand data management, curation, and reuse from their perspective? If so, please comment or respond to this post and tell us about your experiences – How have you done it? What have you learned? What have they learned? What challenges remain?

Ixchel M. Faniel

Ixchel M. Faniel is a Senior Research Scientist at OCLC. She conducts user and library studies related to research data management, reuse, and curation practices and online information behavior.