This post is co-authored by Karen Coombs, OCLC Senior Product Analyst
Our virtual dialog began with Karen C’s tweet:
But Karen S-Y couldn’t respond in just the 140 characters Twitter allows. Instead she sent an email:
Transcribing transliteration” from a piece is almost an oxymoron. It rarely occurs. Transliteration by definition is converting one writing system (e.g., Chinese characters) into another writing system (e.g, Latin-script characters, or romanization). Catalogers in Anglo-American countries will transliterate non-Latin titles using ALA/LC romanization for the writing system on the piece; other countries may use other transliteration schemes.
You will generally find transliterated titles whenever there is a non-Latin title (in MARC, stored in the 880-245 field). But OCLC doesn’t support all scripts, and not everyone takes advantage of the scripts OCLC does support – e.g., we support Cyrillic but only 10% of all Russian-language titles in WorldCat have the Cyrillic that appears on the piece. The ALA/LC romanization for Cyrillic is distinctly different from the ISO standard used by almost everyone else, so where we rely only on the romanized strings, the same title in Cyrillic may be represented by different clusters using different transliteration schemes. (In the graphic that precedes this entry, two romanizations are shown for the Russian “War and Peace”.)
In general, it’s better to rely on the non-Latin script title if we have it than any transliteration that may be also be in the record. The non-Latin script titles will be transcribed from the piece and any transliteration will be supplied by a cataloger, which may or may not match the transliteration supplied by another cataloger…
Karen C. wrote back:
I think you answered the question the user was asking when you said that “The ALA/LC romanization for Cyrillic is distinctly different from the ISO standard used by almost everyone else, so where we rely only on the romanized strings, the same title (with the same Cyrillic string) will be represented by different clusters using different transliteration schemes.”
The user asked, “Your API returns texts in Russian in a strange transliteration format. As I see, it’s not ISO-9. For example, this text: “Oni vernulis? na rodnui?u? planetu, gde za vremi?a? mezhzve?znogo pole?ta proshlo bol?she sta let i vse? tak izmenilos?, chto Zemli?a? stala chuzhoi? im”. Please, can you tell me, how to convert this format into correct Cyrillic?”
At least I understand the why now.
Karen S-Y commented:
It also happens to be the case where there is almost a one-to-one correspondence between romanized Russian and its Cyrillic counterpart. That is why most libraries didn’t bother adding the Cyrillic. Since the system requires that if you put in non-Latin script you also enter the romanization, it represents “double work.”
This prompted Karen C. to ask:
Does the MARC record have any way to tell you if a title was romanized?
Karen S-Y answered:
By inference, yes.
If the language code is for a language not written in Latin characters, and there is no 880 in the MARC record, then the non-English information in the record is by definition all romanized (non-English information if the language of cataloging is English).
The following table shows the percentages of WorldCat records describing materials in the top 15 languages that are written in non-Latin scripts that WorldCat supports represented by the original script (transcribed from the piece) and by transliteration only (supplied by the cataloger). Most records for languages written in Cyrillic and Indic scripts contain transliterations only.
Top 15 languages in WorldCat written in non-Latin character sets
Karen Smith-Yoshimura, senior program officer, works on topics related to creating and managing metadata with a focus on large research libraries and multilingual requirements.