This is my 116th—and last—blog post. I’m retiring at the end of November, something I’ve deferred as I’ve had such a great time hanging with all of you—staff at our Partner institutions, professionals from all corners of the library, archival, and information technology worlds, and my OCLC colleagues. But it’s time. Of all the ways I know to say “good-bye”, the Japanese sayonara is the most wistful: it literally means “if it must be so” (shorter than “parting is such sweet sorrow.”)
You’ve inspired me and taught me so much! I hope I’ve contributed meaningfully to the evolving discussions around metadata, linked data, and multilingual support to improve access to the information our communities want.
I am proud to have been part of the foundation of the Unicode Consortium. My work with the East Asian Character Set (Z39.64) proved that “Han Unification” was feasible— just as we have one code for the character “a” whether it’s used in English, French, German, Tagalog, Indonesian, etc. with different pronunciations, we can have one code for each Chinese (“Han”, or 漢) character common to Chinese, Japanese, and Korean. I advocated strongly for “Han Unification” and wrote a position paper on it in 1991. The Unicode Chronology highlights the stages of incorporating Han Unification into Unicode 1987-1992; Unicode became an international standard (ISO 10646) in 1993.
Unicode represented an “infrastructure revolution” for all of us. Those of you of a certain age may recall the days when there was a separate character set used in library systems, which included a range of diacritics to be used with other characters, often for use in transliterations of non-Latin scripts. But the data could only be shared and used by other library systems—if you copied/pasted data into another application it came out as gibberish. Non-Latin scripts were each defined by separate national character sets, and unless you used the same national character set, you could not read the text. Unicode, the result of a consortium including major computer corporations, software companies, and research institutions, changed all that. The scope included a far wider range of scripts than any other character set (the latest version includes over 140,000 characters). Because Unicode significantly decreased the costs of developing products for a global market, it was very quickly implemented in software applications. And Unicode included all the “combining diacritics” that libraries had used for decades. We take our ability to read non-Latin scripts in different applications and on websites for granted now.
Library catalogs still do not yet take advantage of the full range of scripts available in Unicode, however. Library users who read languages written in non-Latin scripts should be able to search and retrieve the metadata describing the resources written in those languages using the metadata in that script. Unfortunately, many of these non-Latin script resources are represented in catalogs only by transliteration, a barrier to access. (See my 2015 blog post, “Transcription vs. Transliteration.”) I’m pleased that OCLC has taken steps to remedy that situation, starting with the languages written in Cyrillic script. My colleagues Jenny Toves, Bryan Baldus, and Mary Haessig blogged about this work earlier this year in “кириллица в WorldCat”.
Soon after the adoption of Unicode and (to me) amazingly quick implementation, I started bringing together the managers of technical services to discuss common issues and identify work that was needed to guide future developments that would improve the metadata underpinning the discovery of all the resources curated and managed by libraries, archives, and other cultural heritage organizations. Over the last 27 years this group evolved into the OCLC Research Library Partners Metadata Managers Focus Group, which at one point included representatives from 63 Partner institutions in 12 countries spanning four continents. It spawned six working groups or task forces focused on particular issues and published reports of their investigations, such as Registering Researchers in Authority Files in 2014 and Addressing the Challenges with Organizational Identifiers and ISNI in 2016. My meta-synthesis of the Focus Group’s discussions over the last six years was recently published as an OCLC Research Report, Transitioning to the Next Generation of Metadata. The recordings of our November 2020 discussions about the report are available as “past webinars” on the Works in Progress Webinars web page.
The Focus Group’s intense interest in who was implementing linked data and for what purposes led to a series of “International Linked Data Surveys for Implementers” I conducted between 2014 and 2018. A total of 143 institutions in 23 countries reported one or more linked data project or service. The results of these surveys are shared for the benefit of others wanting to undertake similar efforts on the OCLC Research Linked Data Survey web page.
One of the most rewarding highlights of my career was collaborating with my OCLC colleagues and OCLC members on “Project Passage,” a linked data Wikibase prototype which served as a sandbox in which librarians from 16 institutions could experiment with creating linked data to describe resources. The project was stimulating, educational, and fun! I enjoyed writing up what we learned with some of the participants in the 2019 report, Creating Library Linked Data with Wikibase: Lessons Learned from Project Passage. This work generated another working group, Archives and Special Collections Linked Data Review Group, drawn from the OCLC Research Library Partnership’s rare book, archives, and special collections communities, which explored key issues of concern and opportunities for archives and special collections in transitioning to a linked data environment, summarized in the 2020 OCLC Research Report, Archives and Special Collections Linked Data: Navigating between Notes and Nodes.
I leave behind a set of publications and presentations. The relationships I’ve enjoyed with so many talented, inspiring staff within the OCLC Research Library Partnership I’ll treasure. I look forward to seeing what you all do in the coming years to leverage metadata and embed multilingualism into everything you do!
Sayōnara!
Karen Smith-Yoshimura, senior program officer, topics related to creating and managing metadata with a focus on large research libraries and multilingual requirements. Karen retired from OCLC November 2020.
A privilege to work with you on the Pinyin Project years ago ! What fun that was — right down to the stuffed boneless ducks feet at the banquet.
Karen, Congratulations on a truly impactful career whose influence extends well beyond the library sphere – as your Unicode work demonstrates. The world is a better (and more connected) place for your contributions!
Karen, congratulations on an amazing career, and thank you for being such a wonderful co-worker during the years we spent together at RLG and then OCLC! All the best to you. I imagine your feline friends must be thrilled to have ALL of your attention soon!
Hi Karen,
You are the one I met who has been doing the best work for the CJK community. When I came to the States 30 years ago, you hosted the RLG international conference discussion to form the Cataloging Guidelines for the Chinese Rare Book Project. You are the preface writer for its 2000 edition. The 2018 edition is in the Cataloger’s Desktop. The Guidelines have been used, revised, and of benefit internationally. I am lucky to be one of them.
You are the one I met at the CEAL conference who perfectly represented RLG to communicate with us for decades. You always energetically provide sharp and easy to understand vision, suggestion, and answers.
It is very encouraging to see you at OCLC webinars. You showed me how far you could reach. I was especially happy to have a chance to talk to you in person at the Philadelphia ALA this March. In few minutes, you solved my puzzle on original language for NACO.
Karen, you are a truly rare, amazing person — I’m luckily to have met you.
祝好 (this word I learned from you)
曹淑文 CAO Shuwen