That was the topic discussed recently by OCLC Research Library Partners metadata managers, initiated by Steven Folsom (then at Harvard), Stephen Hearn of the University of Minnesota, and Melanie Wacker of Columbia. Traditional authority control models have relied on left anchored browsing of alphabetically ordered lists of terms, a model that interposes the controlled terms, preferred, variant, and related, between the searcher and search results. The new world of authority sources in which libraries operate include ORCID and other international registries. Vocabularies designed for left-anchored browsing are a poor fit for current discovery environments oriented toward keyword search and facet term sets pulled directly from displayed search results.
Some new discovery layers are starting to take advantage of variant and related terms. Although traditional web OPACs based on the underlying local library system take advantage of variant terms, related terms, and scope notes, most institutions now also have a discovery layer that aggregates descriptive information from different sources in different formats. Initially none took advantage of any information in authority files. A few have now added non-preferred terms to bibliographic data and indexed them, or show related terms and reference as well as links to Wikipedia entries for persons and corporate bodies. Cornell takes advantage of the contextual information recorded in RDA authority records. Ideally, we would want to pull in information from different sources to provide context—multiple authority files, thesauri, VIAF, Wikipedia, etc.—and pull in the information most relevant.
Metadata managers reported that browsing generally has low usage. Those who measured usage of their browse data reported very low usage (under 2%). Search interfaces that do still support a browse search hide it away in the advanced search. The primary users are librarians and faculty—expert users. In several instances browse searching had been removed but then brought back. It appears that, while usage is low, there are several use cases that cannot be met without browse searching. The “power users” who use browse are “really vocal” so it’s important to learn why they use browsing and to think creatively about other ways to meet their needs. Database managers also find browsing useful to identify anomalies in the data to be fixed. What tools could be used instead to uncover these anomalies?”
“Browsing” just means looking through contents of a list of some kind, not just left-anchored browsing of headings used in our traditional systems. In discovery environments that bring together disparate sources conforming to different standards and using different vocabularies, are there ways we can enable browsing entities which also provides contextual information? For example, the linked data prototype catalogs of the national libraries of France (data.bnf.fr) and Spain (datos.bne.es) provide different views of objects, persons and subjects that focus on entity relationships rather than text strings.
Lessons from non-library systems include upfront disambiguation, showing relationships between entities, and providing contextual information. Metadata managers would like their systems to be smarter about using the data they have to pull concepts together rather than just words. Examples of providing such context included the University of Pennsylvania’s Online Books such as this search result on Women; Cornell’s experimenting with making knowledge card-type displays where related information is displayed in a separate window, such as this display for Women and this one for Malcolm X. Others are learning what vocabularies to use and bringing in data from different sources with linked data. Such experimentation offers opportunities for more collaboration.
Few are addressing multiple overlapping and sometimes conflicting vocabularies. Even MARC records may have different but overlapping vocabularies, for example, records that include both FAST and LC subject headings. To users, they look redundant. In New Zealand, Maori subject headings are added to the same records as LC subject headings; Australia adds terms authorized for indigenous peoples. But a growing percentage of data in institutions’ discovery layers come from non-MARC, non-library sources. Metadata describing universities’ research data and materials in Institutional Repositories is usually treated completely differently—and separately. How to provide normalization and access to the entities described so users don’t experience the “collision of name spaces” and ambiguous terms (or terms meaning different things depending on the source)? Synaptica solutions is working on a tool kit on crosswalks among different vocabularies and languages that sounds promising.
As different vocabularies were designed for different contexts, we need middleware that can help normalize or at least identify the differences among them. Perhaps we could use linked data to concatenate designated bits of data and then display the appropriate labels depending on context? We need to envision new ways and models of managing vocabulary data beyond left-anchored browsing for our discovery environments and providing users the context they need. The experimentation OCLC Research Library Partners are undertaking to help us all shift from strings to an entity-based environment is very encouraging.
Graphic: data.press.net under CC BY-ND 3.0
Karen Smith-Yoshimura, senior program officer, works on topics related to creating and managing metadata with a focus on large research libraries and multilingual requirements.