Next Generation Metadata... it's getting real!

This spring, OCLC Research are running a discussion series on Next Generation Metadata, where library leaders, metadata experts and practitioners from the EMEA time zone (Europe, Middle East, and Africa) can participate to share their experiences, deepen their understanding of the topic, and gain confidence in planning ahead.

The theme of the series is inspired by Karen Smith-Yoshimura’s OCLC Research report, “Transitioning to the Next Generation of Metadata“, which depicts this transition as an evolving process, intertwined with changing metadata standards, infrastructures, and tools. The series also offer an opportunity to showcase OCLC’s pioneering research and experimentations in this area and its current work on building a Shared Entity Management Infrastructure, that will support linked data initiatives throughout the library community.

As part of the series, seven round table discussions are held in different European languages – English, Spanish, French, German, Italian, and Dutch – to address the question:

“How do we make the transition to the Next Generation of Metadata happen at the right scale and in a sustainable manner, building an interconnected ecosystem, not a garden of silos?”

Making it happen

This blog post reports back from the Opening Plenary webinar, held on 23 February 2021, where OCLC speakers presented to kick off the series.

Rachel Frick, Executive Director Research Library Partnership, introduced the theme by depicting how the library community is in the midst of a transformative change: the metadata is changing, the creation process and the supply chains are changing, and new architectures are emerging. At the same time, metadata departments in libraries are getting less attention, professional staff is decreasing, and staff is de-professionalizing. Libraries used to be knowledge organizations and library professionals were trained in bibliographic description and authority control. Now, authorities are called entities and the new description logic is about creating a “Knowledge Graph of Entities”. Is this transition simply about putting old wine in new bottles? For a long time, linked data experts and enthusiasts were not able to convince their peers nor their leadership. After some years of intensified experimentation, this has changed. OCLC, leading libraries, and many other stakeholders are now making it happen and confidence is growing that linking data across isolated systems and services will enhance the user experience and allow for efficiencies across the value chain. Rachel concluded:

“OCLC plays an important role in cultivating understanding and helping the community to embrace this transition – and this discussion series in the EMEA region is one of the many avenues it offers for doing so.”

Annette Dortmund, Senior Product Manager and Research Consultant, went on to present key themes from Karen Smith-Yoshimura’s report, which compiles six years of discussions with the OCLC Research Library Partners Metadata Managers Focus Group on the evolution of the next generation of metadata. In these discussions, the need for change became crystal clear. “Curated text-strings in bibliographic records” are nearing obsolescence, both conceptually and technically. The report describes the changes taking place in a number of areas including the transition to linked data and identifiers, the description of inside-out and facilitated collections, the evolution of metadata as a service, as well as resulting staffing requirements. The transition is key to achieving important library goals, such as multilingualism – which in turn, is closely connected to EDI (Equity, Diversity and Inclusion) goals and principles. It opens opportunities for libraries to engage in new areas where metadata is becoming key, such as Research Data Management (RDM). Karen’s rich and informative report is on the reading list for the participants of the discussion series. As Annette put it:

“In a way, it frees us from discussing the WHY and WHERE TO of this transition. Why do we need a change, and where do we need to go? It is all in there, based on so much expertise and so many hours of robust discussion. And so, it frees us to move on to the HOW and WHO. How do we make the change happen, and who can help us with it – and who is already working on it?”

From experimenting to the heavy lifting

I gave a short overview of the findings from the OCLC Research report, “Transforming Metadata into Linked Data to Improve Digital Collection Discoverability: A CONTENTdm Pilot Project”, published early in 2021. CONTENTdm is a digital library system – it is OCLC’s product that allows libraries to build, manage, and showcase digitized cultural heritage collections, and make them discoverable to people and search engines. The Product Development team, OCLC Research colleagues and library professionals from the CONTENTdm user community, together, investigated methods, workflows and tools to produce linked data from the Dublin Core-based descriptions and link up the people, places, concepts, and events that populate these descriptions across CONTENTdm systems. The pilot convincingly showed the innovative potential of linking data and using persistent identifiers – both from a data management and discovery perspective. It also brought home the realization that a paradigm shift of this scale will necessarily take time to carry out and calls for long-term planning and collaboration strategies:

“An overarching question driving the linked data project was, for a paradigm shift of this magnitude, how can the foundational changes be made more scalable, affordable, and sustainable? (…) It will require substantial and shared resource commitments from a decentralized community of practitioners who will need to be supplied with easily accessible tools and workflows for carrying out the transition.”

In the follow up presentation, John Chapman, Senior Product Manager, explained how OCLC’s Shared Entity Management Infrastructure addresses this general concern and the needs identified by library partners, namely the need for: (1) entity URIs/persistent identifiers relevant to library workflows (for works and persons) at the point of need (during the descriptive process) and (2) facilities to link library data to non-library data and shared data to local data. The Andrew W. Mellon Foundation identified OCLC as an organization that can operate at the large scale that is required…and do so sustainably. It awarded OCLC a $2.436 million grant to develop such an infrastructure in 2 years’ time. The Knowledge Graph that is being built is seeded from the knowledge contained in authority files, WorldCat creative works, and controlled vocabularies. This requires much “semantic lifting” – i.e., turning the knowledge that is hidden “in the spaghetti pile of strings in MARC” into structured data or facts. Due attention is given to the provenance and context of the knowledge claims and to multilingual approaches. John noted that curation support for the library community will be important, as well as thinking of APIs and machines as core users. The infrastructure is currently still leveraging Wikibase as its primary technical component, but an architectural decision is in the making to move beyond and scale much larger – in John’s words:

“While Wikibase and Wikidata grew organically over time, this project is going forward very quickly and adding a lot of data very quickly, so we’ve needed to think about engineering some different loading and ingest technologies that keep up with the bigger scale.”

The presentations elicited much interest and many questions. One attendee asked: “Do you think that the value propositions of next generation metadata have been effectively explained to and understood by library directors? And what would be the main things that you would want library directors to do next, if they are engaged?” Indeed, an excellent suggestion to approach library directors now, at such a decisive moment in time, when infrastructures are emerging, and strategic choices need to be made! Another question touched on an important aspect that we propose to address during the round table discussions: “You are now creating a knowledge graph based on library data. Are other projects making other knowledge graphs, on other topics, that are then in the end hopefully connected in some kind of super knowledge graph?”

What’s next?

Next up in the discussion series are the seven round table discussions. Unfortunately, seating is limited, and we have already reached capacity for all of them. However, we will be synthesizing these discussions and sharing findings with the community in the Closing Plenary webinar on 13 April. It is open to everyone interested in the topic. We will also be sharing relevant findings via blog posts here, on the OCLC Research blog Hanging Together, and through other channels, so please stay tuned and watch this space.

Titia van der Werf

Titia van der Werf is a Senior Program Officer in OCLC Research based in OCLC’s Leiden office. Titia coordinates and extends OCLC Research work throughout Europe and has special responsibilities for interactions with OCLC Research Library Partners in Europe. She represents OCLC in European and international library and cultural heritage venues.

One Comment on “Next Generation Metadata… it’s getting real!”

Eden Walker says:

April 21, 2021 at 2:27 am

Its notable to about hear that OCLC expects a huge part in making game-course of action and supporting the district driving forward through this advancement – and this conversation philosophy in the EMEA territory is one of the different roads it offers for doing also. While Wikibase and Wikidata developed regularly after some time, this undertaking is going on rapidly and adding an immense heap of information rapidly, so we’ve expected to consider putting together somebody of a sort stacking and ingest pushes that stay aware of the more irrefutable extension. Thankfull to you for clarifying such a data in your blog.

Comments are closed.