That was the topic discussed recently by OCLC Research Library Partners metadata managers, initiated by Naun Chew of Cornell and Stephen Hearn of the University of Minnesota. Focus group members manage a wide variety of image collections presenting challenges for metadata management. In some cases image collections that developed outside the library and its data models need to be integrated with other collections or into new search environments. Depending on the nature of the collection and its users, questions concerning identification of works, depiction of entities, chronology, geography, provenance, genre, subjects (“of-ness” and “about-ness”) all present themselves; so do opportunities for crowdsourcing and interdisciplinary research.
Many describe their digital image resources on the collection level while selectively describing items. As much as possible, enhancements are done in batch. Some do authority work, depending on the quality of the accompanying metadata. Some libraries have disseminated metadata guidelines to help bring more consistency in the data.
Among the challenges discussed:
Variety of systems and schemas: Image collections created in different parts of the university such as art or anthropology departments serve different purposes and use different systems and schemas than those used by the library. The metadata often comes in spreadsheets, or unstructured accompanying data. Often the metadata created by other departments requires a lot of editing. The situation is simpler when all digitization is handled through one center and the library does all of the metadata creation. Some libraries are using Dublin Core for their image collections’ metadata and others are using MODS (Metadata Object Description Schema). It was suggested that MODS be used in conjunction with MADS (Metadata Authority Description Schema).
Duplicate metadata for different objects: There are cases where the metadata is identical for a set of drawings, even though there are slight differences in those drawings. Duplicating the metadata across similar objects is likely due to limited staff. Possibly the faculty could add more details. Brown University extended authorizations to photographers to add to the metadata accompanying their photographs without any problems.
Lack of provenance: A common challenge is receiving image collections with scanty metadata and with no information regarding their provenance. For example, a researcher took OCR’ed text retrieved from HathiTrust, ending up with millions of images. However, the researcher didn’t include the metadata of where the images came from. The challenge is to support both a specific purpose and group of people as well as large scale discovery.
Maintaining links between metadata and images: How should libraries store images and keep them in sync with the metadata? There may be rights issues from relying on a specific platform to maintain links between metadata and images. Where should thumbnails live?
Relating multiple views and versions of same object: Multiple versions of the same object taken over time can be very useful for forensics. For example, Stanford relied on dated photographs to determine when its “Gates of Hell” sculpture had been damaged. Brown University decided to describe a “blob” of. various images of the same thing in different formats and then have descriptions of the specific versions hanging off it. The systems used within the OCLC Research Library Partnership do not yet have a good way to structure and represent relationships among images, such as components of a piece.
Integrating collections from different sources: Stanford is considering ingesting images from a local art museum, many of which are images for a single object, so that scholars can study the object over time. They are wondering how to represent them in their discovery layer. MIT is trying to integrate metadata coming from different departments so that they can contribute to different aggregators, such as the DPLA. All involved get together to make sure that there is a shared understanding. Contributing and having images live in an aggregated way present new challenges.
Yale’s largest image collection is the Kissinger papers, with about 2 million scanned images. For much of the collection, metadata is very scanty. Meetings among the collection owner, metadata specialist and systems staff try to resolve insufficient or questionable data and to come to a shared understanding. They store two copies of each image: TIFF (preservation and on request) and JPEG for everything else).
Managing relationships with faculty and curators: It’s important to ensure that faculty feel their needs are met. Collaboration is necessary among holders of the materials, metadata specialists and developers as all come from different perspectives.
Challenges of bringing together different images or versions of the same object in a large aggregation were explored by OCLC Research’s Europeana Innovation Pilots. The pilots came up with a method for hierarchically structuring cultural objects at different similarity levels to find semantic clusters.