Summarizing Project Passage experiences in creating library linked data in Wikibase (1 of 2)

Last year we concluded a ten-month pilot, called “Project Passage”, using a Wikibase instance to create and edit library metadata in collaboration with 16 U.S. libraries. Our upcoming report documenting our experiences and the lessons learned will be published later this month. We were delighted that ten of the Project Passage participants* contributed to this report.

The project’s objective was to evaluate a framework for reconciling, evaluating, and managing bibliographic and authority data as linked data entities and relationships. The Project Passage participants represented a community of metadata specialists who could use the Wikibase instance as a “sandbox” where they could freely experiment with describing library and archival resources in a linked data environment. OCLC staff chose to install a local instance of Wikibase, the platform for storing and managing structured data that underlies Wikidata—the structured dataset used by Wikipedias and other Wikimedia sister projects. Wikibase offers built-in features such as auto-suggest, embedded multilingual support, a SPARQL Query Service, and application programming interfaces that allowed incorporating third-party additions. It also generates entities described in the Resource Description Framework, or RDF, without requiring technical knowledge of linked data. As a result, the OCLC project team could focus on what the Project Passage participants needed most beyond the initial set of capabilities.

The OCLC team also developed two utilities that interoperated with the Wikibase software platform. Coming from a MARC environment, practitioners were accustomed to saying something in a single place—a bibliographic or authority record designed primarily for a human reader. But input in the Wikibase editing environment could be linked behind the scenes and repurposed. Project Passage participants needed to see the information that was connected behind the scenes. To demonstrate this potential in a discovery application, the OCLC project team developed an interface called Explorer, which created a display of related entities (such as the translations of a book) assembled by a SPARQL query across the entire Wikibase RDF dataset. The Explorer also built displays that combined structured statements and narrative text with images from DBpedia and Wikimedia Commons. Pilot participants also needed to pull in data that existed elsewhere, such as their local files, so the OCLC project team developed the Passage Retriever tool to bring data into the Wikibase instance that could serve as the basis for a new resource description. These two tools eased the task of description and made it possible to see the effects of work in progress.

The Project Passage practitioners demonstrated a variety of use cases during weekly “office hours’ where we walked through issues that arose and discussed community norms and practices that needed to be established. The report describes these use cases in detail:

Two use cases of non-English descriptions
Four use cases of image resources
Two use cases of describing archival and special collections
A 15^th-century musical work associated with an ecclesiastical event

Xiaoli Li of UC Davis and I described our experiences with Chinese descriptions and representing works and their associated translations in the 18 June 2019 Works in Progress Webinar, Case Studies from Project Passage Focusing on Wikidata’s Multilingual Support. Our experience showed that multilingual descriptions need not be constrained by a “preferred form”, nor enter transliterations as others can add descriptions in different languages and writing systems. The concept of “language of cataloging” disappears in this environment.

Use cases described in the upcoming Project Passage report

The image resources described were for a map, a poster, a postcard, and a photograph, where the content of the resource is more image than text; three of them were related to a specific event. A historical map of Concord, Massachusetts, as described by Marc McGee of Harvard, included natural features, man-made structures, names of landowners, roads, and district boundaries. He demonstrated the complex connections between the map and the publisher, location, and date as well as requiring new “roles” such as Henry David Thoreau as a “surveyor” in this instance. The Passage editing workflow enabled the open-ended addition of many details and relationships, such as features visually represented on the map, that would be difficult or impossible to express in MARC except as free text.

Kalan Knudsen Davis and her colleagues at the University of Minnesota described a poster for an Everly Brothers concert; Karen Detling and her colleagues at the National Library of Medicine described a postcard featuring Princess Maria Josepha of Saxony as a nurse during World War I; Holly Tomren and her colleagues at Temple University Libraries described a photograph of Dr. Martin Luther King Jr. and Cecil B. Moore at a protest rally at Girard College. All three use cases were associated with an event, and the metadata specialists had to grapple with how much effort is worth investing into creating related entities to provide machine-understandable context for interpretation and when it makes sense to stop.

The Temple photograph also provided a starting place for describing archives in the Passage editing workflow, as an item in a collection within a research institution. The Passage descriptions for the archive were a by-product of the need to describe the photograph. Given the tradition of narrative and other text-based description, archives would seem to benefit most from workflows that facilitate creating structured data because so little is otherwise available. But the Passage experience revealed we need much more community discussion to produce models redrawing the line between structured and textual data and best practices for both.

Craig Thomas at Harvard described a sacred musical composition commissioned for and performed at the consecration of Florence Cathedral in 1436. The items and properties describing the musical score, the consecration event, and the interconnections between the two produced a network of relationships that exceeds the detail currently represented in corresponding MARC-based library authority files.

In sum, Project Passage allowed participants to gain valuable insight into how to build relationships in structured, machine-readable semantic data and obtain instant feedback about their work in a discovery interface. The pilot was unique in that it created an environment for practitioners to experience their first deep encounter with linked data concepts in their current jobs, allowing them to make head-to-head comparisons between current and new standards of practice, while preserving the most important values of librarianship. The Wikibase platform supports establishing provenance, authority, and trust at the level of individual statements in a way that is far more sophisticated than corresponding library-community practices. Practitioners can declare the existence of an item as a linked data real-world object about which facts can be assembled and associated with a globally unique URI. Additional items and properties can also be declared in the moment for a wide range of resource types, in a multiple of languages and writing systems. These descriptions can link to library-community datasets as well as vocabularies and ontologies maintained elsewhere.

The co-authors agreed that the Wikibase sandbox made it “very easy to connect theory and practice” and that the interfaces provided were better than any other linked data project that they had been involved in.

Coming next: Lessons learned and reflections

* Kalan Knudson Davis, Karen Detling, Christine Fernsebner Eslao, Steven Folsom, Xiaoli Li, Marc McGee, Karen Miller, Honor Moody, Craig Thomas, Holly Tomren

Karen Smith-Yoshimura

Karen Smith-Yoshimura, senior program officer, topics related to creating and managing metadata with a focus on large research libraries and multilingual requirements. Karen retired from OCLC November 2020.