Museum Data Exchange: Asking the right questions

The logistical details of publishing the tools we have produced as part of the Museum Data Exchange Mellon grant continue to unfold in a slower fashion than I had hoped, but I am now fairly confident that you will find applications for download announced at some point next week – more when it actually happens!

In the meantime, the focus of our activity with the museum partners has moved from creating tools to analyzing the data they’ve shared while using them. We now have data from six institutions who have allowed us to harvest CDWA Lite XML records created with and shared through a combination of COBOAT and OAICatMuseum 1.0 (again, more as we release the tools), plus records from three additional museums who had other means of creating and sharing CDWA Lite XML at their disposal. A total of about 850K records are now sitting behind a firewall on an OCLC Research server, awaiting data analysis.

Our next big question is: how can we evaluate the data the museums have shared? While it uses the same data structure (CDWA Lite XML), all participants are aware that rules to populate that data structure with data content may vary considerably from institution to institution. Cataloging Cultural Objects is becoming a household name, but a good bit of the data shared probably predates the emergence of this data content standard, let alone its local implementation. What are the right questions to ask which would give the participating museums a sense of how well their records play with each other, both in terms of the institutional dataset as well as the aggregate resource?

Here are some of the questions the museum partners surfaced in a conference call last month:

Questions about the institutional data set:

Evaluate the data against the required fields of CDWA Lite – are the required fields present?

Evaluate the data against the data content standard CCO – what is the state of CCO compliance of the data?

Evaluate the consistency of the data – are the same terms used to denote the same concept (i.e. each artist is represented by 1 name and 1 name only throughout the data)?

Evaluate the use and effectiveness of published vocabularies – who uses which vocabularies, and does their use demonstrably aid retrieval?

Questions about the aggregate data set:

Evaluate the “usefulness” of the aggregate collection – do queries return meaningful results?

Evaluate the variations in institutional practice – how is my cataloging different from the other institution’s cataloging?

Provide a baseline of CDWA Lite fields present in all records – which fields are used by all institutions?

Evaluate the impact of the lack of subject data in the aggregate collection

For both areas, they wanted to have an answer to the following question: What strategies can we use to work around the inevitable inconsistencies in the data?

The project team on the OCLC Research side (Ralph LeVan, Bruce Washburn and I) took their questions, and attempted to formalize them into something we call our “data analysis methodology”. I’ll share this document, which amplifies the questions museum participants asked and puts them into a framework, in a blog posting in the coming weeks.

In the meantime, if you have a question you’d like to ask of 850K CDWA Lite records, please leave a comment, and we’ll see whether we can fold it into our methodology!

Günter Waibel

One Comment on “Museum Data Exchange: Asking the right questions”