Cyrillicizing WorldCat Russian records

Russian is the 5^th most common language of the resources described in WorldCat, yet up to now the great majority of the records have been accessible only by searching by the romanization of the Cyrillic text. Furthermore, the romanization represented in WorldCat is predominantly by the ALA/LC romanization rules for Russian, rather than the ISO standard used by libraries outside the Anglo-American sphere.

This has been a severe handicap to users who look for Russian titles by Russian authors and expect that of course the titles would be available in the script the material was written—Cyrillic. I previously described these problems in the 2015 HangingTogether post, Transcription vs. Transliteration. At the time, only 12% of all Russian titles in WorldCat included the Cyrillic script.

Fortunately, Russian is one of the languages where there is a close 1:1 relationship between the Cyrillic characters and the corresponding Latin characters. This has been the reason why we have so many romanized-only Russian records; library systems required that any non-Latin script also have a parallel romanized field and entering both Cyrillic text and romanization represented double work. But because there is that 1:1 relationship, why not generate the Cyrillic script from the romanization?

And that’s just what my OCLC colleagues have done! In late February a team led by my colleague Jenny Toves generated Cyrillic scripts for over 885,000 romanized-only Russian WorldCat records! Another 78,000 records encoded as Program for Cooperative Cataloging’s BIBCO and CONSER records will be processed separately. The number of Russian records that will have been “Cyrillicized” will be close to one million, increasing the percentage of Russian WorldCat records with Cyrillic from 12% to 26% —more than double the number of records. The team will re-examine the ones that have been skipped and see if they can also be processed.

But meanwhile, users can find many more Russian records using the Cyrillic script!

Karen Smith-Yoshimura

Karen Smith-Yoshimura, senior program officer, topics related to creating and managing metadata with a focus on large research libraries and multilingual requirements. Karen retired from OCLC November 2020.

2 Comments on “Cyrillicizing WorldCat Russian records”

Stephen Arnold says:

March 4, 2020 at 6:07 am

Thanks. Might it be possible to make public the control. nos. of a sample set of bib records which have been given this treatment?
1. Karen Smith-Yoshimura says:
  
  March 4, 2020 at 10:13 am
  
  A random ten OCNs of the Russian WorldCat records that have been Cyrillicized: 789183891;
  1080815794; 26542437; 51243911; 13625863; 643835910; 32782416; 74413449; 221811636;
  588825034

Comments are closed.