Representing works and their translations in Wikibase

Partial view of timeline of some of the translations in our Wikibase prototype for Heidegger’s Sein und Zeit. The circled ones represent four different Japanese translations.

The cream of the world’s culture and heritage is shared by being translated—it’s how we learn about other cultures and how other cultures learn about us. I’ve long been interested in the metadata needed to describe and provide multilingual access to the resources managed by libraries, archives, museums and other cultural heritage organizations. I’ve latched onto the potential of linked data to display the metadata in the preferred language and script of the user, and particularly in associating translations with their original work. (See for example my presentation at the DCMI-17 conference, Using the Semantic Web to Improve Knowledge of Translations.)

Now, with my OCLC Research colleague Jeff Young, I’ve been applying our translation model by experimenting with representing works and their associated translations in Wikibase, as part of the OCLC Research Linked Data Wikibase Prototype. The prototype uses MediaWiki, a free and open-source wiki software that supports the various language Wikipedias, other Wikimedia Foundation projects, and other wiki’s. Wikibase is an extension to the MediaWiki platform, a collection of applications to store, manage, and discover structured data. Among its many features, the ones that especially appeal to me are its embedded multilingual support, structured data editor, auto-suggest, and the freedom to create various semantic queries using SPARQL, an RDF query language.

We developed a kind of template to represent works and their translations in Wikibase. For works, we create or generate at minimum the following statements:

  • The original title (in the original script)
  • Language of the original
  • The instance (e.g., book)
  • Author(s) of the original
  • Earliest known publication date

For translations, we create or generate at minimum:

  • The translated title
  • Language of the translation
  • The instance (specified as “translation”)
  • Translator(s)
  • Earliest known publication date
  • The work the title was translated from

WorldCat has many translations, but not all of the information above can be gleaned from the MARC records, and non-Latin script titles may be represented in romanization only. We started with a subset of works and their translations from WorldCat, augmented by information for work title entities in Wikidata, a collectively edited structured database used by Wikimedia projects such as the different language Wikipedias and other sites (such as Google’s Knowledge Graph).

The timeline above is a partial display of the translations represented in our Wikibase instance for Martin Heidegger’s Sein und Zeit.  The ones circled are for four different translations in Japanese, each by a different translator. The titles in modern Greek and Farsi scripts were taken from Wikidata, as the WorldCat records for these translations had only romanization.

We can limit our query to just English translations:

Wikibase query limited to English translations of Sein und Zeit

Each translator is represented by a Wikibase entity item. Translators are crucial for differentiating translations into the same language, as some translators are better than others. We’ve also included identifiers for each work, translation, and person.

My colleague Bruce Washburn created a discovery layer for the Wikibase entries which also illustrates the power of using linked data, for example, pulling in information from DBpedia:

 

Partial display of the discovery layer to Wikibase’s entry for Sein und Zeit

We’re now investigating approaches to scaling up the import of entities and metadata reconciliation. The visualization tools provided, such as maps (for geographic entities and cartographic materials) and the timelines shown here help reviewers to spot outliers that need correction.

Karen Smith-Yoshimura, senior program officer, works on topics related to creating and managing metadata with a focus on large research libraries and multilingual requirements.

3 Comments on “Representing works and their translations in Wikibase”

  1. Heidegger is the author associated with Sein und Zeit.

    Heidegger / Sein und Zeit is the work associated with the translation Being and Time.

    Is there any author (not translator, not work) associated with Being and Time? Is the expectation that any author associated with the original work would be automatically associated as an author in the translation’s metadata? Or would it be left to users to decide whether the original work’s author can properly be taken as the author of the translation?

    1. I left off the “Description” in the illustration to save space. Each of the translations have the description, ” translation of Heidegger’s Sein und Zeit. The query looked only for translations, but one could adjust the query to include the original work. The DATA is there in each translation to show that it’s a translation of the original work, with a link to the identifier describing Heidegger’s original work.

  2. Dear Karen,

    very nice approach. But what about using the additional indormation from the Wikipedia universum (and the Google Knowlegdge Graph) for enriching WorldCat? And how and where to discuss the best stratgey: Bulidng authority records for works or using the additional information only for better work clustering (represented by OWI-IDs) or enriching the single records with the informations about translators, original work title / not romanized title information for better indexing or a mix of all three action lines?

    Please continue your work on this topic. It could also have impact on the handling different romanization practices.

    Rupert Schaab (Göttingen University, Germany)

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.