First English round table on next generation metadata: towards a critical mass of interoperable library data

As part of the OCLC Research Discussion Series on Next Generation Metadata, this blog post reports back from the first English language round table discussion held on March 2, 2021.  

The session was scheduled to facilitate a broad audience from the EMEA time zone (Europe, Middle East, and Africa). Library representatives from the UK, Poland, Greece, Lebanon, and Egypt – with backgrounds in bibliographic control, special collections, collections management, metadata standards, and computer science – joined the session and formed a very diverse and engaged discussion group.

Mapping Exercise

Map of next-gen metadata projects (English session-1)

The group started with mapping next generation metadata projects that participants were aware of, on a 2×2 matrix characterizing the application area: bibliographic data, cultural heritage data, research information management (RIM) data, and for anything else, the category, “Other”. The resulting map gave a nice overview of some of the building blocks of the emerging next generation metadata infrastructure: the various international identifier initiatives – including the lesser known International Standard Manuscript Identifier (ISMI) –, the schemas and interoperability standards being implemented in the three different application areas. The map pointed to several relevant EU-funded projects such as: FREYA, TRIPLE, STIRdata, and Europeana. Also noteworthy, were the mentions of UK initiatives to streamline metadata across the supply chain, such as Plan-M (in relation to the National Bibliographic Knowledgebase), the Book Industry Communication (BIC)’s Metadata Map, and the British Library’s collaboration with publishers to add ISNI’s upstream in the chain. Finally, a few institutional specific projects were mentioned – e.g. Finding Archives and Manuscripts in Oxford University Special Collections (FAMOUS) -, as well as the UKRI-funded project, Towards a National Collection, in the cultural heritage area. Overall, an interesting range of activities that take place on very different scales – institutional, national, European, and international – providing ample opportunity for a conversation about linking-up initiatives across these scales.

The limits of doing it alone

Attendees told about their individual efforts to move to the next level of metadata, to align with new standards and practices, and the practical barriers encountered. Some were trying hard to modify the structure of their metadata, include PIDs/URIs, and shift from MARC to RDA practices with existing tools and formats. Some experimented with Wikidata, with RDF-based data modelling and with creating new vocabularies for historical names, periods, and other concepts in linked data form.

All were clearly enthused by the promise of next generation metadata but also, impressed by the challenges and the steep learning curve when using new tools. This quote from one of the participants shows how individuals often struggle dealing with the complexities of the many different web-based tools and their capabilities:

“I have a project with Google Arts & Culture and I am using Wikidata for adding URIs of the artists that I am adding to my exhibition online, but sometimes I can’t find the Wikidata URI for this artist and I am perplexed and don’t know how to deal with it.”

The importance of achieving a critical mass of interoperable library data

Participants looked forward to when the next generation metadata infrastructure would reach the scale and level of interoperability needed to achieve efficiencies. As one of them said: “We have metadata from all of the collections and all the research outputs for the university – that’s a lot of day-to-day work and we want to look at how to move those more into the next generation metadata processes instead.”

They welcomed the OCLC Shared Entity Management Infrastructure. For one of the attendees, working with medieval collections, this project is hugely exciting:

“because we have maintained our authority file just by hand, we’ve never been able to catch our sort of weird medieval stuff into the normal NACO files (…), and so this project is really going to transform the way we work. It will mean that we’re able to connect our metadata with our partner institutions, it will mean that medieval manuscripts will no longer be in this walled garden.”

In the general move towards more interoperable library data, the effort to increase the usability of authority data is central. Enriching the ISNI-database with this data (Library of Congress control numbers and VIAF identifiers) and vice versa, populating the LoC Authority File with VIAF identifiers and ISNIs is a major contribution to this effort. In the UK, this effort is carried out by the British Library. Another effort that was mentioned, is IFLA’s work on transforming the ISBD standard to become better aligned to the Library Records Model (LRM) and RDA, and the launch of IFLA Namespaces to allow the use of IFLA standards in the linked data ecosystem.

Interlinking at the aggregation level

Participants noted that the metadata management systems and repositories in place in libraries and cultural heritage institutions are not supporting basic functionalities to link items to vocabularies or thesauri. It was observed that this shortcoming is now being fulfilled at the aggregation level, by large players in the field who have been enriching metadata at scale and publishing it as linked data. This development was not considered reciprocally beneficial for all parties, as one of the attendees put it:

“So, a lot of money is consumed through projects with bigger players, such as aggregators or at the aggregation level, adding links and URIS to metadata that are derived from institutions, which is not sustainable because the added value does not return back to the institutions that own the metadata.”

Nevertheless, aggregation has been and still is the cornerstone of interconnecting and linking-up databases across different silos. The EU-funded TRIPLE Project was discussed as an example of this – it is an aggregator of Social Sciences and Humanities content, which maps the harvested metadata from repositories and publishers to a schema.org based data model and supports multilingualism, the use of discipline-specific vocabularies and PIDs, such as ORCID. It also aims to connect with other, similar projects, such as OpenAIRE and Europeana, operating in the European research and cultural heritage areas, respectively. Notwithstanding the obvious benefits of such cross-sectoral interlinking, it was also noted in the group that aggregators have a hard job deduplicating and disambiguating the data coming from heterogeneous sources – suggesting this might be avoided if the interlinking were done at the source level instead of the aggregation level.

Share what you have and avoid duplication

During the last part of the session, a small discussion flared up around vocabularies. Should we continue creating new vocabularies? Why not reuse those you already have? At the end of the conversation, one of the participants underlined the need to share what we have, to avoid duplicating efforts and above all to connect our data with more communities.

“It’s not only about data contextualization or enrichment, but it’s also about discovering the landscape, being connected (…) because we have very authoritative data, that are built in such a way that it will benefit other communities also”

About the OCLC Research Discussion Series on Next Generation Metadata

In March 2021 OCLC Research conducted a discussion series focused on two reports:

“Transitioning to the Next Generation of Metadata”

“Transforming Metadata into Linked Data to Improve Digital Collection Discoverability: A CONTENTdm Pilot Project”.

The round table discussions were held in different European languages and participants were able share their own experiences, get a better understanding of the topic area, and gain confidence in planning ahead.

The Opening Plenary Session opened the forum for discussion and exploration and introduced the theme and its topics. Summaries of all the eight round table discussions are published on the OCLC Research blog, Hanging Together. This post is the first one.

The Closing Plenary Session on April 13 will synthesize the different round table discussions. Registration is still open for this webinar: please join us!

Titia van der Werf

Titia van der Werf is a Senior Program Officer in OCLC Research based in OCLC’s Leiden office. Titia coordinates and extends OCLC Research work throughout Europe and has special responsibilities for interactions with OCLC Research Library Partners in Europe. She represents OCLC in European and international library and cultural heritage venues.