Elusive Quality

We talk a lot about data curation, but rarely about data quality. How do researchers determine if a dataset is appropriate for their intended purposes? They may need to know how the data was gathered (sometimes including the sensor equipment used and how it was calibrated), the degree of accuracy of the data, what null elements mean, what subsequent changes have been made to the data, and all sorts of provenance information.

The University of North Carolina invited about 20 people from a variety of communities to an NSF-funded workshop, titled, Curating for Quality: Ensuring Data Quality to Enable New Science. The final report has just been published. In its appendices are the white papers that were prepared in advance of the workshop, including one that Brian Lavoie and I wrote, titled, The Economics of Data Integrity, which is on page 53 of the report.

The most useful outcomes of the workshop came from the group’s brainstorming of projects that would advance the discussion. We settled on eight that seemed actionable and fleshed them out a bit. We were encouraged to pursue the projects that moved us, either by working informally with like-minded individuals or by making a proposal to NSF. There’s no reason, however, that anyone couldn’t take up any of these ideas.

For those of you in a hurry, the Conclusion and Call to Action on page 17 and 18 of the report sum up the issues quite nicely.