German round table on next-generation metadata: Formats, contexts and deficits

As part of the discussion series on Next Generation Metadata, this blog post reports back from the German language round table discussion held in the morning of March 10, 2021. (A German translation of this post is available here.)

Participants from Germany, Switzerland, and Hungary represented national libraries, state libraries, university libraries, and special libraries; combined, they had backgrounds in metadata and collection development, open access and automated subject indexing, metadata concepts and entity management – all the ingredients for a lively and varied discussion.

Mapping exercise

Map of next-gen metadata projects (German session)

As in all other round table discussions, taking stock of projects in the regions was a first step and resulted in a map mural of projects, which indicated strong activity in the quadrant of bibliographic data and some additional activity in the other sections, research information management (RIM), scholarly communications and cultural heritage.

Formats and contexts

The “MARC21 –> BIBFRAME” note on the map immediately sparked a general discussion about the suitability of data “formats” in different contexts. While there was agreement that BIBFRAME was more suitable and flexible than MARC, it, too, has its limitations. To actually exchange data, agreements need to be in place (and adhered to!) on how the standard is used. And bridging different types of data is not one of BIBFRAME’s strengths.

As one participant noted:

The separation of title and authority data is no longer valid, as in the future all types will be part of one big graph.

Participants noted a necessity to create gangways between different data sources. New platforms need to be modular and scalable enough in order to accommodate the subtleties of various participating institutions. Moving away from authority files to identity management allows libraries to link e.g., research data with classic library data. Other libraries create cross-references to other systems, like the coli-conc project, or enrich their catalog with links to additional information from external sources. Customers want to find information, regardless of where it is and where it comes from. Meaningful links can be created without introducing new rules and building a complex new infrastructure.

The nascent Hungarian National Library Platform (also shown on the map) focuses on a graph model that stores triples and not MARC data; that way, the data then is not tied to a specific format and the platform can serve multiple sectors; at the same time, exchange formats can be created as needed to accommodate specific needs.

Another relevant project in this area listed in the research information management quadrant of the map is Metagrid – a project that links data from the digital humanities with other data, including authority files such as the GND. However, authority files never have enough historical details and the fine-granular information that historians would need. Which again emphasizes the need for creating gangways between data sources to benefit from one another’s work. We cannot all do everything, a participant warned.

Library specific formats still have their role in specific contexts. National libraries publishing national bibliographies need to do so following a reliable set of rules, even though these very rules might become obsolete in other contexts.

At the same time, library data finds itself next to data of a very different kind. One example is the tax-funded Swiss E-government data initiative, the E-government Schweiz portal: All data that is not confidential has to be made available for all citizens. Library data is published next to weather data etc., it is published as RDA data and these triples can be used for any application. There is no way to foresee what users might one day do with this data, including in combination, perhaps. Which is also very exciting!

How can we integrate the libraries’ unique assets and strengths into the linked data world? 

Auto-indexing needs language tagging

Another theme that emerged quite strongly was that of automated subject indexing and resulting data requirements.

Current metadata has strong deficits in its quality in terms of machine-readability. For example, author keywords, abstracts etc. are needed in the metadata records to enable auto-indexing. This calls for a shift in the way in which data is handled, which type of data is needed, how it is stored, and how it is typified.

Multi-lingualism is another big challenge in this context. Current authority data is modelled to have one preferred language. Future authority data needs to be modelled more flexibly, like in Wikidata, where a term has labels in more than one language (as in the example of FIFA mentioned during the session).

For auto-indexing, all metadata elements need to be language-coded so that it is obvious to machines, not just users, which language is used for a given element or string. Librarians sometimes think that indicating the language of the document should be sufficient but that is not the case. This is both a coordination and a staffing problem.

Automatic language-detection scripts are part of the solution but that has a certain fuzziness, participants noted. Maybe, one participant suggested:

If we can get automatic subject tagging to work well, librarian staff could be freed–up for language tagging.

Scaling the effort could also be beneficial. Currently, auto-indexing initiatives are often just local, and cooperation with library networks can be slow and tedious, participants observed. Cooperating internationally has its benefits, especially when cooperating with those much further ahead. The Finnish National Library, for example, develops solutions in this area and provides them for local deployment.

Linked data efforts, too, should not be limited to local or regional scales but if possible, take place at the national level with strong links to an international infrastructure. The fact that, at least in Germany, many initiatives are traditionally linked to library networks and thus regional in scale, which can sometimes be a barrier to scaling up, one participant observed.

Librarians need to revisit their understanding of their role. 

Often, when discussion next generation metadata topics it comes down to priorities. Can we re-use more of the data generated upstream, by publishers, producers, universities, without spending much time on creating it again in our libraries, to free up staff for other work? A difficult topic to raise with cataloguers, though, participants felt.

And it is not just cataloguers … As a profession, we need to challenge and question positions of libraries which often do not have a broad perspective, one participant suggested. Administration is often slow and sluggish. The library world has not changed that much in the past ten years, unlike other sectors.

Finally, participants agreed, let us get rid of the “project” concept, but rather acknowledge that this is an ongoing effort which needs appropriate staffing, unlimited job positions, and sufficient financial resources! In this realm at least of next generation metadata, we should no longer be working on a “project” basis.

About the OCLC Research Discussion Series on Next Generation Metadata  

In March 2021, OCLC Research conducted a discussion series focused on two reports: 

“Transitioning to the Next Generation of Metadata”

“Transforming Metadata into Linked Data to Improve Digital Collection Discoverability: A CONTENTdm Pilot Project”.

The round table discussions were held in different European languages and participants were able share their own experiences, get a better understanding of the topic area, and gain confidence in planning ahead. 

The Opening Plenary Session opened the forum for discussion and exploration and introduced the theme and its topics. Summaries of all eight round table discussions are published on the OCLC Research blog, Hanging Together. This post is preceded by the posts reporting out on  the first English session, the Italian session, the second English session and the French session.

The Closing Plenary Session on April 13 will synthesize the different round table discussions. Registration is still open for this webinar: please join us! 

Annette Dortmund

Dr. Annette Dortmund led OCLC’s European product management and research concerned with next-generation metadata solutions for libraries and other cultural heritage institutions, with a particular focus on persistent identifiers in scholarly communication and library linked data. She also coordinated and supported European research and engagement programs for the OCLC Research Library Partnership.