The logistical details of publishing the tools we have produced as part of the Museum Data Exchange Mellon grant continue to unfold in a slower fashion than I had hoped, but I am now fairly confident that you will find applications for download announced at some point next week – more when it actually happens!
In the meantime, the focus of our activity with the museum partners has moved from creating tools to analyzing the data they’ve shared while using them. We now have data from six institutions who have allowed us to harvest CDWA Lite XML records created with and shared through a combination of COBOAT and OAICatMuseum 1.0 (again, more as we release the tools), plus records from three additional museums who had other means of creating and sharing CDWA Lite XML at their disposal. A total of about 850K records are now sitting behind a firewall on an OCLC Research server, awaiting data analysis.
Our next big question is: how can we evaluate the data the museums have shared? While it uses the same data structure (CDWA Lite XML), all participants are aware that rules to populate that data structure with data content may vary considerably from institution to institution. Cataloging Cultural Objects is becoming a household name, but a good bit of the data shared probably predates the emergence of this data content standard, let alone its local implementation. What are the right questions to ask which would give the participating museums a sense of how well their records play with each other, both in terms of the institutional dataset as well as the aggregate resource?
Here are some of the questions the museum partners surfaced in a conference call last month:
Questions about the institutional data set:
Questions about the aggregate data set:
For both areas, they wanted to have an answer to the following question: What strategies can we use to work around the inevitable inconsistencies in the data?
The project team on the OCLC Research side (Ralph LeVan, Bruce Washburn and I) took their questions, and attempted to formalize them into something we call our “data analysis methodology”. I’ll share this document, which amplifies the questions museum participants asked and puts them into a framework, in a blog posting in the coming weeks.
In the meantime, if you have a question you’d like to ask of 850K CDWA Lite records, please leave a comment, and we’ll see whether we can fold it into our methodology!