That was the topic discussed recently by OCLC Research Library Partners metadata managers, initiated by Daniel Lovins of Yale and Stephen Hearn of the University of Minnesota. As controlled vocabularies and thesauri are converted into linked open data and shared publicly, they often separate from their traditional role of facilitating collection browsing and find a renewed purpose as Web-based knowledge organizations systems (KOS). As Marcia Zeng points out in Knowledge Organization Systems (KOS) in the Semantic Web: a multi-dimensional review, “a KOS vocabulary is more than just the source of values to be used in metadata descriptions: by modeling the underlying semantic structures of domains, KOS act as semantic road maps and make possible a common orientation by indexers and future users, whether human or machine.”
Good examples of such repurposing are the Getty Vocabularies, which not only allow browsing of Getty’s representation of knowledge, but also help users generate their own SPARQL queries that can be embedded in external applications. Another example is Social Networks and Archival Context (SNAC), which enables browsing of entities and relationships independently of their collections of origins. In such cases, the discovery tool pivots to being person-centric (or family-centric, or topic-centric.), rather than (only) collection-centric. BIBFRAME, RDA, and IFLA-LRM vocabularies may prove similarly valuable as knowledge organization systems on the Web.
We identified three kinds of Knowledge Organization System functionality:
- Clustering, based on hierarchical relationships, which would support clusters that include both broad and narrower terms, rather than the typical current clusters based on a single term
- Presenting, showing information about an entity such as in knowledge cards or panels
- Navigating, allowing users to follow related terms to explore topical and other relationships
Much of the promise of Knowledge Organization Systems depend on system functionality, which is usually outside the control of the library. We posited that its clustering functionality might be more useful than presentation and navigation.
We noted that Knowledge Organization Systems providing “semantic road maps” would require a major shift from local “collection-centric” systems to “knowledge organizations.” Not only is it unlikely we will ever have one “universal” knowledge organization system, we postulated that it may not even be desirable. Instead, we might spend our resources more effectively on incorporating or reconciling existing thesauri or ontologies. For example, Google provides access to far more data than individual library catalogs, but when you click on a link in a Google result you then can continue searching in the new environment using its conventions. Rather than one “global domain,” perhaps the library community could provide added value by adding bridges from the metadata in library domain databases to other domains. We cited Wikidata as an example of aggregating entities from different sources and linking to more details in various language Wikipedias.
Most institutions do not have discovery systems that present controlled vocabularies to users as knowledge organization systems. One of the barriers to doing so is the different ontologies and vocabularies represented in our metadata. Some have overlaps, and others have different hierarchies. Some use very broad terms, others are more granular. It is difficult for systems to establish relationships between vocabularies at the item level (by using semantics like owl:sameAs) much less at the vocabulary level. UCLA’s catalog provides a list of sources for different controlled vocabularies retrieved from a search that users can click on to continue to search within a specific vocabulary. Showing the provenance of a specific subject heading could inform users which vocabulary might be more relevant to them. The Program for Cooperative Cataloging’s Task Group on URIs in MARC submitted a proposal to the MARC Advisory Committee to encode the source vocabulary in main entry, uniform title, and added entry fields (MARC Proposal 2019-02), which was approved in January 2019. This change will better support the multiplicity of vocabularies in library metadata.
Use of controlled vocabularies from other countries is particularly challenging. Subject headings from the National Diet Library, China, and Korea all have non-Latin scripts. They may be useful for those who can read the scripts, but not for those who cannot. Wikidata addresses this by allowing users to set their default language so that they see information in their preferred language. However, there are cases where there are no satisfactory equivalences across languages; different concepts in other national library vocabularies cannot always be mapped unequivocally to English concepts. The multi-year MACS (Multilingual Access to Subjects) has built relationships across three subject vocabularies: Library of Congress Subject Headings, the German GND integrated authority file, and the French RAMEAU (Répertoire d’autorité-matière encyclopédique et alphabétique unifié). It has been a labor-intensive process and is not known to be widely implemented. There are other factors to consider such as the geographic region where a term is used that may differ from other regions using the same language (e.g., American vs. British English; French-Canadian vs. French-French vs. Swiss-French). Although imperfect, with embedded biases, numeric classification systems might be an approach to overcome differences in language labels,
The discussion highlighted some of our common aspirations for future systems both for discovery and for metadata management. We realize that Web-savvy users are accustomed to using different search techniques in different environments, so bridging across domains may be more feasible than trying to attain one universal Knowledge Organization System.
Karen Smith-Yoshimura, senior program officer, works on topics related to creating and managing metadata with a focus on large research libraries and multilingual requirements.