The cream of the world’s culture and heritage is shared by being translated—it’s how we learn about other cultures and how other cultures learn about us. I’ve long been interested in the metadata needed to describe and provide multilingual access to the resources managed by libraries, archives, museums and other cultural heritage organizations. I’ve latched onto the potential of linked data to display the metadata in the preferred language and script of the user, and particularly in associating translations with their original work. (See for example my presentation at the DCMI-17 conference, Using the Semantic Web to Improve Knowledge of Translations.)
Now, with my OCLC Research colleague Jeff Young, I’ve been applying our translation model by experimenting with representing works and their associated translations in Wikibase, as part of the OCLC Research Linked Data Wikibase Prototype. The prototype uses MediaWiki, a free and open-source wiki software that supports the various language Wikipedias, other Wikimedia Foundation projects, and other wiki’s. Wikibase is an extension to the MediaWiki platform, a collection of applications to store, manage, and discover structured data. Among its many features, the ones that especially appeal to me are its embedded multilingual support, structured data editor, auto-suggest, and the freedom to create various semantic queries using SPARQL, an RDF query language.
We developed a kind of template to represent works and their translations in Wikibase. For works, we create or generate at minimum the following statements:
- The original title (in the original script)
- Language of the original
- The instance (e.g., book)
- Author(s) of the original
- Earliest known publication date
For translations, we create or generate at minimum:
- The translated title
- Language of the translation
- The instance (specified as “translation”)
- Earliest known publication date
- The work the title was translated from
WorldCat has many translations, but not all of the information above can be gleaned from the MARC records, and non-Latin script titles may be represented in romanization only. We started with a subset of works and their translations from WorldCat, augmented by information for work title entities in Wikidata, a collectively edited structured database used by Wikimedia projects such as the different language Wikipedias and other sites (such as Google’s Knowledge Graph).
The timeline above is a partial display of the translations represented in our Wikibase instance for Martin Heidegger’s Sein und Zeit. The ones circled are for four different translations in Japanese, each by a different translator. The titles in modern Greek and Farsi scripts were taken from Wikidata, as the WorldCat records for these translations had only romanization.
We can limit our query to just English translations:
Each translator is represented by a Wikibase entity item. Translators are crucial for differentiating translations into the same language, as some translators are better than others. We’ve also included identifiers for each work, translation, and person.
My colleague Bruce Washburn created a discovery layer for the Wikibase entries which also illustrates the power of using linked data, for example, pulling in information from DBpedia:
We’re now investigating approaches to scaling up the import of entities and metadata reconciliation. The visualization tools provided, such as maps (for geographic entities and cartographic materials) and the timelines shown here help reviewers to spot outliers that need correction.
Karen Smith-Yoshimura, senior program officer, works on topics related to creating and managing metadata with a focus on large research libraries and multilingual requirements.