In March 2025, in response to previous discussions among the OCLC RLP Metadata Managers Focus Group members about the use and re-use of WorldCat Entities linked data URIs in WorldCat MARC, the group met again for a series of “Product Insights” sessions on the same topic.

41 participants from 30 RLP institutions in 4 countries attended the three separate sessions:

Art Institute of Chicago	The New School	University of Leeds
Binghamton University	New York Public Library	University of Manchester
Clemson University	New York University	University of Maryland
Cleveland Museum of Art	Pennsylvania State University	University of Pennsylvania
Cornell University	Princeton University	University of Pittsburgh
Library of Congress	Tufts University	University of Sydney
London School of Economics and Political Science	University of California, Los Angeles	University of Tennessee, Knoxville
Monash University	University of Chicago	University of Southern California
National Gallery of Art	University of Hong Kong	Virginia Tech
National Library of Australia	University of Kansas	Yale University

“Product Insights” sessions give our RLP partners exclusive and early access to information about the work our product colleagues are doing, and in return allow our product colleagues to gain relevant insights from the field. The guest of honour on this occasion was Jeff Mixter, Senior Product Manager at OCLC for Metadata and Digital Services. A former member of the OCLC Research team experimenting with linked data, back in the day, Jeff now leads OCLC’s linked data products, applications and services development and is always eager to engage in conversations around real life use cases for library linked data.

Unlock new opportunities, one step at a time, at scale

Jeff started the sessions off by quickly reiterating OCLC’s linked data strategy.

Instead of treating linked data as a natural progression from MARC, as is still often done, as the new technology that must by default be the better one, his team is looking at challenges and inefficiencies in today’s metadata operations, investigating how linked data can help solve problems, create more efficiencies or – most importantly – unlock new opportunities that simply aren’t there right now.

The team’s second focus is on gradual achievements, implementing metadata and workflow changes one step at a time, at scale, not disrupting systems and workflows that for decades have been reliant on MARC data and will continue do so for quite some time to come.

Thirdly, Jeff strongly believes in supporting libraries no matter where they are on their journey to linked data and what internal or external forces may limit their ability to change. One way of doing this is creating what is sometimes called “linky MARC” by adding WorldCat Entities URIs to MARC 21 bibliographic records in WorldCat. WorldCat Entities is a set of authoritative linked data entities and URIs for people, organizations and other entity types that can be used when describing library resources. These URIs are agnostic to the descriptive framework and can be used in whatever the preferred data model happens to be in a particular context. By populating WorldCat with WorldCat Entities URIs at scale, the intention is to support libraries that are far ahead on their journey to linked data, empower libraries that are on their way, and give all others a starting point for their transition to a metadata future less reliant on MARC.

A couple $1s, a 4.0, and some more facts worth knowing

The first part of the Product Insights session was used to answer a mix of concrete questions concerning OCLC ‘s linked data work.

URIs in WorldCat bibliographic records

Jeff explained that OCLC is adding URIs for WorldCat Entites to WorldCat bibliographic records in two ways:

Bulk updates of records to add URIs in certain MARC fields (e.g., 100, 651, and 700)
As part of the regular offline batch controlling service that controls headings from certain vocabularies, URIs are added when a candidate heading is controlled.

Updates to OCLC’s cataloging products in 2024 are also contributing to the enrichment of records with URIs. When a cataloger controls headings for persons in the 100 and 700 fields, a URI is automatically inserted. An attendee noted how a URI may easily be added to a controlled unqualified personal name heading (e.g., “Eco, Umberto”) by simply uncontrolling and then recontrolling the heading. A single action by a cataloger in a record both controls a heading for current MARC needs and provides a bridge for future linked data functionality.

Responding to questions from previous sessions, Jeff explained that OCLC is putting WorldCat Entities URIs in subfield $1 in MARC records, because this is where URIs for “Real World Objects” (RWOs) should go, as opposed to subfield $0 which is usually pointing to authority records.

These URIs in $1s are then by default included in all data exported from WorldCat no matter which route is chosen. Unfortunately, URIs get “lost” on their way from WorldCat to some library service platforms. Attendees were encouraged to approach their suppliers to check they are not inadvertently dropping URIs in $1 subfields upon ingest.

Terms and Conditions

Access to WorldCat Entities data is tiered in the way that authenticated users can get a larger set of data for a URI than unauthenticated ones. For authenticated use all one needs is a completely free API key which is available from the OCLC developer network. Use of the data regardless of level of access is governed by a CC BY-NC 4.0 license, as also detailed under “Terms and Conditions” on the WorldCat Entities website.

Data Provenance

When editing WorldCat Entities data through Meridian, data provenance information can currently be added at the individual claim level, such as for a person’s date of birth or area of expertise, by adding a URL for an information source. OCLC maintains history of description changes, logging all changes made to the entity, and by which institution.

Workflow integration

Participants asked how WorldCat Entities URIs integrate with current bibliographic workflows. In addition to the bulk updates and enrichments described above, one can also manually look up and add entity URIs in WorldShare Record Manager while cataloguing, or even create them on the fly, as a Meridian subscriber. This functionality will also come to the Connexion cataloguing application and ultimately also to CONTENTdm. Using the WorldCat Entities APIs, this type of integration could also be realized in any other tool used for metadata work in whichever context, provided the development work is being done.

Person Entities – yes, please

With all those new facts on the table, participants were ready to get more deeply engaged. Our first topic was to explore the specific value and the specific challenges surrounding person entities and identifiers.

Person identifiers are needed almost everywhere. They can be a way to identify individuals unambiguously, without too much effort. With person identifiers, it is also easier to identify more than just the corresponding authors for a publication. A participant shared that for their electronic theses and dissertations they add ORCID identifiers for multiple authors as well as for related persons, such as professors or committee members.

Even where creating authority records is the norm, staffing issues and competing priorities can slow this work considerably, motivating libraries to investigate quicker ways to reliably disambiguate authors by using identifiers.

Regional diversity is a strong driver for person entities as well, prompting both the wish to identify and disambiguate regional personalities that could not have an authority record, for some reason, and the need to add labels in multiple regional languages. This need not stand in contrast to having an authorized heading, where that exists. Linked data URIs, at least in WorldCat, will never overwrite the authorized heading, leaving the user with the best of both worlds.

Yes, but …

So the need and the benefits are clear, but … unfortunately, barriers seem to be everywhere.

A very concrete barrier discussed was the difference in practice for books and for articles. While person metadata for books is governed by rigorous authority control, article level data is not, and first names often listed only by initials, following citation formats. This makes it almost impossible to reliably reconciliate and match authors across these platforms and data sources.

An area where matching issues become visible also to the end user are experimental linked data features in library service platforms such as person cards. These often rely only on one single source, such as an authority file, and will not pick up information coming from other sources, such as the institutional repository data where authors are identified by their ORCID identifier, resulting in incomplete person cards. As one participant pointed out, these features are useful to show what linked data can do – and where the gaps still are.

If incomplete information is bad, incorrect information is worse by far. Libraries feel responsible for the accuracy of the data they are presenting to their users, and libraries can be concerned that quality issues resulting from errors in external data sources will reflect badly on them. Correcting data at the source, such as in Wikidata, will not always immediately update representations in the target systems. In one instance, an attendee shared, it took a full month to get the corrected Wikidata information to display in the library service platform’s person card. These may be teething troubles in new product features, but it was a wake-up call for the library in question.

So many PIDs, so little time

A large part of the discussions centered on problems created by the fact that there is a multitude of different person identifiers. These identifiers serve very specific purposes, have specific limitations and function best when used in specific workflows. As a result, libraries end up using ORCID identifiers in some contexts and authority files in others. And then, there is Wikidata. Wikidata was mentioned more than once as a source of identifiers to complement the set when the default options are not available. “It’s still kind of a grab bag” one participant noted. But the resulting mix of identifiers cannot easily be integrated. If this could be solved, many operational issues would disappear.

There are efforts underway to combine linked data identifiers with traditional metadata. One participant wondered if adding ORCID identifiers to MARC records could help. Other institutions add ORCID and ISNI identifiers to name authority records. But this kind of work, while seen as useful, does not easily scale.

Our discussions then explored the option of creating something that could sit in the middle and stitch it all together. A hub of identifiers with lots of “same as” relationships that would allow each identifier to function in its own context but could also be used or referred to for cross-identifier discovery. Alas the hub idea, a participant warned, would have to be quickly and fully adopted at scale to be integrated into workflows, and then there is also the question of how to generate sufficient trust in such a solution. Even a platform as large as Wikidata is not often trusted to be the strategic long-term choice for library identifiers.

At this point Jeff joined the discussion to share his conviction that for him, too, the goal must be to make identifiers talk to each other, and that WorldCat Entities was designed as a sort of bridge to try and help connect silos in library land. His vision goes beyond WorldCat, by the way, he is also working with product colleagues to start integrating WorldCat Entities identifiers across CONTENTdm digital materials, as well as in the OCLC Central Index, which is predominantly newspaper articles, journal articles, and book chapters.

Searching across silos and entities

Assuming we could solve these problems, with or without a hub, what could we then do that we cannot do now?

Easily search across platforms for related materials.
- For example, we could improve discovery across the collection of rare print materials, managed in one system, and the materials in the cultural collections, managed in another system.
Extend our cross-platform searches beyond local environments and system landscapes and leverage the larger library linked data knowledge graph.
- For example, primary source materials (the example mentioned was painted murals) could be connected with secondary sources, such as the painter’s printed works, and works about the painter or her work. This is the type of richer discovery experience that should become possible when using and connecting linked data entities.
- In this context, the Linked Art project was mentioned (https://linked.art/), a model for linked data description and management of cultural heritage materials.
Make unexpected discoveries.
- Many collections are in a place where researchers do not expect them to be, but network level searching or browsing would allow them to be “found” and connected in new ways.

Unlocking these types of new opportunities is perhaps one of the biggest promises of library linked data and a truly global GLAM knowledge graph.

Many thanks to all those who supported me in writing this blog post, in particular my colleagues Jeff Mixter, Kate James, and Rebecca Bryant.

Annette Dortmund

Dr. Annette Dortmund led OCLC’s European product management and research concerned with next-generation metadata solutions for libraries and other cultural heritage institutions, with a particular focus on persistent identifiers in scholarly communication and library linked data. She also coordinated and supported European research and engagement programs for the OCLC Research Library Partnership.