Experimentations with Wikidata/Wikibase

That was the topic discussed recently by OCLC Research Library Partners metadata managers, initiated by Daniel Lovins of Yale, John Riemer of University of California, Los Angeles, Melanie Wacker of Columbia, and Stephen Hearn of University of Minnesota. Many libraries are looking toward Wikidata and Wikibase to solve some of the long-standing issues faced by technical services departments, archival units, and others. There is no shortage of interesting ideas how these tools could be employed: linked open data, bridging silos, multilingual support, an alternative to traditional authority control, and highlighting special collections are just come examples. Wikibase, the software platform underlying Wikidata, provides a robust infrastructure for knowledge graphs, triple stores, and SPARQL queries, which would be very difficult for libraries to build on their own. One can contribute to or draw from Wikidata itself as a knowledge graph or use the Wikibase software to develop library-specific data models and entity relationships.

On the network level, examples of exploratory work include: Project Passage, a linked data Wikibase sandbox put together by OCLC Research which allowed 16 institutions to experiment in 2017-2018; OCLC’s CONTENTdm Linked Data Pilot (2019-2020); and the Mellon-funded Shared Entity Management Infrastructure (2020-2021). All three use a separate instance of Wikibase. The PCC Task Group on Identity Management in NACO is investigating how Wikidata could fit into libraries’ regular workflow and allow them to take advantage of pre-existing work. In Europe, the Bibliothèque national de France is collaborating with ABES (Agence bibliographique de l’enseignment supérieur) to create a French entities file and the Deutsche Nationalbibliothek is working with Wikimedia Deutschland to create a local Wikibase instance to support its authority file, Gemeinsame Normdatei. Two Wikidata projects mentioned focus specifically on archival use cases: WikiProject Archives Linked Data Interest Group and the Repository Data (RepoData) for United States Archives.

A surprising number of individual libraries are experimenting with Wikidata, among them: Stanford University Library creating descriptions of persons affiliated with Stanford University; Harvard’s Guido Adler Collection project; the proof of concept combining Library of Congress Prints & Photograph collection records with Wikidata; Lori Robare’s project exploring Wikidata use for identity management thereby raising the profile of people and organizations important to Oregon (see her presentation at midwinter ALA 2020, Exploring Wikidata and Its Potential Use for Library Data); York University’s project to create metadata on Indigenous communities and collections using Wikidata (Surfacing Knowledge, Building Relationships: Indigenous Communities, ARL, and Canadian Libraries); the Bibcard project at University of Wisconsin-Madison Library; and Yale’s Black Bibliography Project. The Koninkijke Bibliotheek in the Netherlands has a demonstration, Using Wikidata for entity search in historical newspapers, which illustrates applying Wikidata to enrich its archive of digitized newspaper articles by linking persons, corporate bodies, and geographic names.

Individual institutions’ experimentations with Wikidata are often focused on using the Wikidata identifiers for one of two use cases: for names, corporate bodies, and geographic names for digital collections; or for researchers and documents in their institutional repositories. In both cases, goals often focus on raising the profile of persons important locally or in under-represented groups. Several institutions highlight their special collections by adding the “archives at” property to Wikidata entries of persons or organizations. The University of Nevada at Las Vegas has found Wikidata helpful to express parts of their archival collections that are not in subject headings or other controlled vocabularies, such as unique roles for people in the entertainment industry. The University of Toronto is using Wikidata as a tool to describe the Law Library’s Indigenous Perspectives Collection with alternative subject schemas. Several institutions have recruited a Wikimedian-in-residence to support these efforts.

Some institutions are experimenting with their own local instances of Wikibase exploring use cases such as: creating authorities for local names to bridge internal organizational silos; pushing local data out to Wikidata to reach new audiences; and making use of multilingual discovery capabilities. The Smithsonian Institution, for example, has 19 museums and nine research centers with affiliates around the world, each with its own system and content standards. It hopes that a local Wikibase instance can improve discovery of all the resources held by the Smithsonian for both their internal and external audiences. Yale has received Mellon funding for a project looking to use Wikidata or a local Wikibase instance to reconcile data—linking named entities and concepts that are present, albeit with different labels—across the catalogs of its library and three museums. The British Library is collaborating with Wikimedia UK on a Wikibase project for its Turkish manuscripts and Kurdish printed collections. The goal is to create a database of objects found within the metadata (authors, titles, dates of birth and death, publishing houses, scriptoria, place names, etc.), and then correlate them among various languages so that titles of works from various institutions could all be linked together, making the collection more discoverable in different languages. The ability to display labels in different languages and scripts fits in with institutions’ commitment to equity, diversity, and inclusion.

Given the increase of local instances of Wikibase in development, the work now underway to create a federated ecosystem of local Wikibase instances is critical or each Wikibase instance may end up being “marooned”.

Metadata managers noted that institutional buy-in is needed to support ongoing Wikidata/Wikibase work. Among the reasons for using Wikidata or Wikibase in the library environment:

Expose institutions’ resources to the larger web community
Support institutional outreach to local communities
Can create an entity description with a stable, persistent identifier immediately that can be re-used by others
Create labels in multiple languages and scripts and more respectful to marginalized communities
Infrastructure supports collaboration across communities and countries
Relatively low-barrier way to contribute to linked data and gain experience with “entifying”
Tools are available such as the Reasonator, which displays Wikidata entries as well as related data and generates timelines that current library systems cannot

Among the barriers to using Wikidata or Wikibase in the library environment:

Steep learning curve
Uncontrolled metadata could result in inconsistent data quality
Modeling and entities differ from library standards and practices
The data you enter could be over-written by someone else
Duplicates or overlaps authority work
Concern about scalability and long-term sustainability
Installing a local Wikibase instance requires IT effort

Where to start learning about Wikidata? People referred to the ARL White Paper on Wikidata: Opportunities and Recommendations (2019). The recordings from the LD4P Wikidata Affinity Group calls and resources cited there have been helpful to many. Some have taken the Wikidata Professional Development Training Modules. Learn by doing!

Karen Smith-Yoshimura

Karen Smith-Yoshimura, senior program officer, topics related to creating and managing metadata with a focus on large research libraries and multilingual requirements. Karen retired from OCLC November 2020.