Multilingual WorldCat represented by translations

Multilingual scripts (Noam Chomsky)Great works are translated—the cream of the world’s cultural and knowledge heritage is shared by being translated. And many of them are represented by bibliographic records in WorldCat.

A group of us working on Multilingual WorldCat projects have been focusing on datamining WorldCat for works and all translations associated with them, identifying the translator for each translation. We plan to generate “uniform title” and “expression” records (the translations) and contribute them to the Virtual International Authority File (VIAF).

We currently have roughly 15 million personal name “clusters” in VIAF, the 26 million personal name authority records contributed by 35 agencies that represented the same person. These are not just creators of works, but also people that have had works written about them and sometimes a translator.

My colleague Jenny Toves has identified about 1 million persons in WorldCat who are associated with bibliographic records in more than one language, or roughly 7% of the people represented in VIAF.  The breakdown:

  • 624K names are associated with titles in only 2 languages
  • 283K names are associated with titles in 3 to 9 languages
  • 7K names are associated with titles in 10 or more languages
VIAF breakdown by language
Persons with titles in multiple languages

My colleague JD Shipengrover created the accompanying graphic.

We expect to focus our analysis efforts on the “short head” of the names whose works have been translated the most, and rely on machine algorithms to handle the “long tail” of the names associated with titles in only two or three languages.