Next generation of metadata in cultural heritage: continuing the conversation in Spain

This post is part of a series reporting back from conversations on transitioning to the next generation of metadata in Spain, this time focusing on cultural heritage. (A Spanish translation is available here). My colleague Francesc García Grimau from the OCLC office in Spain and I hosted the session on 19 January 2022.

Capture of the projects map from the Spanish round table session in March 2021

The main stakeholders and projects which our participants from the first Spanish round table session identified, were invited. They included representatives from the National Library of Spain, the Ministry of Culture, the Cervantes Institute, El Prado museum, digital libraries from different regions and collections of the country (e.g. the Digital Memory of Catalonia, the virtual library Miguel de Cervantes) and university libraries with large special collections (e.g. University of Barcelona, Complutense University).

To stimulate the discussion, two invited speakers kicked off the session:

Elena Sánchez, Head of the Innovation and Digital Reuse Service at La Biblioteca Nacional de España (BNE), who presented BNE’s most strategic next generation metadata project
Liz Krznarich, Adoption Manager at DataCite & ROR, who presented on persistent identifiers in cultural heritage. There is a growing need for making digital heritage collections and their objects more citable and discoverable, and their reuse more trackable. You can read the summary of her presentation in her guest post on HangingTogether.

BNElab: the strategic next generation metadata project of the national library of Spain

Elena Sánchez presented on BNElab. This project explores new forms of use of the library’s digital resources and datasets. It responds to the open government data policies and strategies of Spain, in particular the call to share public sector data widely and pro-actively, in order to give everyone the opportunity to value it and derive value from it. In the past decade, the library has undertaken the massive process of unlocking and reformatting all its bibliographic and authority records, and digital collections for reuse. The resulting datasets have been converted into reusable – non-library specific – formats, and made available in the public domain (CC0) through the data portals of the library (BNElab/datos), the Spanish government (Datos.gob.es) and the European Union (Data.Europa.eu).

From 2017 onwards, the library initiated multiple engagement projects in the context of BNElab to entice different communities to use the datasets in new and creative ways. BNEscolar for example, promotes the use of cultural heritage in teaching and school learning. Elena explained how the BNE community platform enrolls citizens for crowdsourcing projects to improve and enrich the library’s descriptions of cultural heritage items and authority records. Besides harnessing the knowledge of Spanish citizens, the library also connects to open knowledge graphs, such as Wikidata for example, to add new data at scale to its catalogue (e.g., place of birth and occupation of historical persons). The BNE is continuously seeking to make the most of its data, looking for new forms of enrichment and use. Elena gave the example of the MarIA project that created an AI-based Spanish language model, trained with the files from the web archive of the BNE. It is the first large-scale model that can understand and write Spanish. It can be used in linguistic applications such as predictors, correctors, chatbots, smart searches, automatic subtitling.

Thinking about the future of services based on these new data resources and formats, Elena added that it will be very important to provide more personalized services to researchers in order to help them understand and interrogate the data. For this, library professionals will need training, tools and strategies to scale such services, she said.

BNElab has many of the ingredients that characterize next-generation metadata projects: transforming descriptive text-based strings into machine-reusable data, embarking on the semantic enrichment and interconnection of data-encoded knowledge about Spanish works and their authors across the Web, building communities of knowledge around datasets, and re-skilling staff across library departments through learning-by-doing. Elena reflected on the many questions raised by the BNElab project – questions which came back during the discussion.

Discussion

On engaging communities of use

How do you reach the right communities and keep them engaged? Elena explained it is a matter of matching supply and demand. Seeking out groups interested in the history of advertising, for example, to match them with your historical advertisement collection. A general call to the public won’t always work. You have to contact them, guide them, understand their interests and match them with others who have valuable knowledge so that users can learn from each other.

On the quality of community contributions

And, how much time is involved with reviewing and validating community contributions through the BNElab projects? Elena observed that overall the quality of the citizen contributions is very good and considered equal to the quality of other sources, such as Wikidata. The contributors validate each other’s contributions. The time spent on validation by the library staff depends very much on the type of projects. The collaboration with the technical staff from the cataloging department on these issues is good and they are not opposed to incorporating the crowdsourced enrichments back into the library’s catalogue. Elena said that due to the pandemic, projects have had a fast turnaround and it would be challenging to maintain that level of engagement:

“I think people were very bored at home and well, we were all very bored. In 2020 the projects all finished and they closed within 24 hours. We really put out projects and said, hey, it’s already closed, I can’t believe it!”

On understanding user experiences

With such a variety of users on BNElab, offering better data querying interfaces and visualizations is high on BNE’s agenda. The transformation from catalogue records to datasets has made them reflect about the granularity, quality and usefulness of the data.

“It is time to investigate how these new delivery formats are used, (…) how did our data answer the questions asked by researchers who work exclusively with this type of resource (…) who want to extract information from a file (…). Is our data capable of responding to their queries?”

On interconnecting collections in Spain

The BNE wants to strengthen the links between its collections and other Spanish data sources, whether of heritage content or scientific content. The group discussed the desirability to interconnect the BNE collections, the virtual library Miguel de Cervantes, the Hispana digital library, the digitized newspaper library, and others. This could be done through identifiers of authority records and those from the thesauri for cultural heritage maintained by the Ministry of Culture. The Ministry has transformed the thesauri into linked data and is currently exploring which interconnections would be of interest to pursue. Linking records from the collective catalogue of Spanish bibliographic heritage with those from the various Spanish digital libraries could, for example, greatly enrich the bibliographic records. Elena and her colleagues have experimented with enriching BNE’s metadata with terms from the Ministry’s thesauri, the controlled vocabulary of the Institute of Sciences and Audiovisual arts, and highly specialized terms from other Spanish language vocabularies. They see interesting opportunities to add finer granularity and specificity to their descriptions.

On moving at different pace

The group observed that memory institutions in Spain are not all transitioning at the same pace; some are dealing with technological delays due to their collection management applications that are not ready yet to support the necessary functionality. Others do not have the means or skills to convert their data into linked data. It is currently a mixed maturity environment and will be for a while. This limits the possibilities of interlinking collections in Spain.

To be continued …

Time flew and again, we had to end the discussion prematurely. This was without any doubt a discussion to be continued. We will be working with volunteers from this group on a discussion paper consolidating findings from this and previous sessions. And we are preparing the next session on the (non-scholarly) book supply chain, which will bring different stakeholders around the table for yet another completely different conversation!

Titia van der Werf

Titia van der Werf is a Senior Program Officer in OCLC Research based in OCLC’s Leiden office. Titia coordinates and extends OCLC Research work throughout Europe and has special responsibilities for interactions with OCLC Research Library Partners in Europe. She represents OCLC in European and international library and cultural heritage venues.