Summarizing Project Passage experiences in creating library linked data in Wikibase (2 of 2)

Last week’s blog post summarized the goals and content of the OCLC Research linked data Wikibase prototype called “Project Passage” and the use cases described by the project participants. After the pilot concluded in September 2018, my colleague Jean Godby and I held discussions with the ten co-authors* of our upcoming report on the lessons learned and reflections about what we learned, some of which are summarized here.

Some of the lessons we learned from our Project Passage experiences:

The building blocks of Wikibase can create structured data with a precision exceeding current library standards.
The Explorer discovery layer mentioned in the previous post was critical to let librarians see the results of their effort without leaving the metadata-creation workflow.
Populating knowledge graphs with library metadata requires tools for importing and enhancing data created elsewhere, such as the Retriever tool the OCLC team developed.
The pilot underscored the need for interoperability between data sources, both for ingest and export.
The traditional distinction between authority and bibliographic data disappears in a Wikibase description.

Some reflections

Paradigm shift: Transitioning from human-readable records to knowledge graphs represents a paradigm shift. The intellectual work undertaken by catalogers to describe resources in the current and new workflows has many similarities, even if the outputs look different. Although some current tasks and practices will still be necessary, others will become obsolete, and some new tasks will be needed. The most important new task is changing the focus on the “item in hand” to “what entities matter to this object?”

Tasks that become obsolete cluster around creating and normalizing natural language text strings in bibliographic and authority records. Some current practices now appear trivial and pointless, such as the time spent on ISBD punctuation in MARC records. The MARC concept of language of cataloging and the requirement to provide transliterations become obsolete in an environment that is inherently multilingual.

Interpretive context, structured and narrative data, best practices, and upholding the values of authoritativeness and quality will still be needed in creating metadata in a linked data environment. The emphasis on “entification” evolves naturally from library current practice. As the community incorporates Wiki* content into its workflows, we will need to determine the appropriate context, structured and narrative data, and best practices that uphold our values.

Reinventing crowd-sourcing: Participants saw the potential of crowd-sourcing for enriching the knowledge graphs created in the Wikibase editing interface. This effort could be supported by the revision history and discussion pages that track every edit for a given Wikibase entity, each associated with a registered username and time stamp. In contrast, quality management is hampered in current resource description workflows by the fact that a MARC record can be marked only as “touched,” with no written trace of what was changed by whom. And discussion takes place outside the editing environment, typically on professional listservs, where the connection to the affected content is lost.

Pilot participants raised concerns that crowd-sourcing in the Wikibase environment could still add unvetted information from unknown sources that would dilute the integrity of curated library data. A self-selected crowd may have a range of skills and expertise that are not all suited to a given description task or use case. But some members of the crowd undoubtedly do have knowledge that complements or supplements that of library and archival staff. For example, scholars who are familiar with non-English and non-Latin script materials could enrich the metadata created by librarians and archivists who lack this expertise.

In conclusion: The Passage pilot represented an opportunity for all participants to gain hands-on experience creating structured data that could be exported as linked data. Among the outcomes were hundreds of new Wikibase item entities and new tools for creating and viewing results. The experience also produced knowledge and wisdom that could be shared, as well as a multitude of concrete ideas that are already giving shape to follow-up investigations. The results of this effort will help materialize the paradigm shift that is evoked by the name of the pilot. The shared goal is a “passage” from standards introduced in the 1960s to a 21st-century solution featuring structured semantic data that promises better connections between libraries and the world beyond.

Coming next: The publication of our report!

* Kalan Knudson Davis, Karen Detling, Christine Fernsebner Eslao, Steven Folsom, Xiaoli Li, Marc McGee, Karen Miller, Honor Moody, Craig Thomas, Holly Tomren

Karen Smith-Yoshimura

Karen Smith-Yoshimura, senior program officer, topics related to creating and managing metadata with a focus on large research libraries and multilingual requirements. Karen retired from OCLC November 2020.