Synchronizing metadata among different databases

That was the topic discussed recently by OCLC Research Library Partners metadata managers, initiated by Naun Chew of Cornell and Joan Swanekamp of Yale. As libraries have increased collecting commercial electronic resources, instituted local or shared digitization programs, and moved to cloud-based services, more bibliographic and inventory information is being managed outside the traditional catalog, such as in separate repositories or through commercial services. The need to manage and integrate data from many different sources presents a different set of challenges from working primarily within a single central system.

The discussions revolved around these themes:

Consequences of not keeping databases synchronized: A common disconnect is between the local catalog and the union catalog or WorldCat. Different versions of a resource (print, digital) can easily get out of sync, so a user may find one but not the other. One example given was where libraries cannot correct a URL for a resource because the database matching-merging algorithm cannot distinguish between a replacement and a new URL. Maintaining the same metadata across multiple databases takes significant effort. Out-of-synch databases can result in frustrating users who cannot find items even though they are held or licensed by the library. Inconsistency and inaccuracy across databases can confuse users.

“Artificial” digital library: Digital libraries also represent the resources of the library, but usually apply different descriptive metadata approaches from those used in the local catalog. Digital libraries represent an “artificial split” from the local catalog, but absent mechanisms for maintaining, updating, and synching between the two, we’re left with two separate libraries going down two separate paths.

What is the “database of record”? Some databases have more functionality than can be provided by the local catalog. For example, Geographic Information Systems (GIS) include more data than can be accommodated in a MARC record. Perhaps for maps, the map database is the “database of record”. Managers need to decide whether “buckets” of data are more useful than trying to pour everything into one system. There are good reasons for multiple platforms and different ways of describing things. Perhaps we don’t need one “database of record”. Do researchers really expect to find everything they need in one place?

Focus on access rather than metadata? Instead of integrating metadata, let’s unify access. Some discovery layers provide a “bento box” display showing results from the different platforms in separate panes. There’s a tension between simplicity and complexity. Libraries cannot replicate the “apparent relevance” ranking Google and other search engines provide because we lack the transactional data needed to weigh into the ranking. Relying on a discovery layer to retrieve information from multiple databases still requires managers to decide what metadata to put where, and why. It also can highlight discrepancies in metadata approaches.

Maybe identifiers will help: Access points are valued by researchers. Possibly using the same identifiers in the metadata in different databases would resolve some of the synchronizing issues. Cornell has been experimenting with FAST (Faceted Application of Subject Terminology) headings in addition to or instead of LC subject headings in its 7-million record catalog. George Washington University has been adding VIAF or id.loc.gov identifiers for names to its records. The hope and anticipation is that identifiers will help, but there still would be much work to address problems of duplicated effort and to reduce the labor devoted to maintenance.

Karen Smith-Yoshimura

Karen Smith-Yoshimura, senior program officer, topics related to creating and managing metadata with a focus on large research libraries and multilingual requirements. Karen retired from OCLC November 2020.

Facebook

Twitter