Archive for January, 2008

This day in history

Thursday, January 31st, 2008 by Merrilee

From the Writer’s Almanac and via Judith Bush…

On this day in 1815, President James Madison approved an act of Congress appropriating $23,950 to purchase Thomas Jefferson’s library of 6,487 volumes In 1814, after capturing Washington, D.C., the British burned the U.S. Capitol, destroying the Library of Congress and its 3,000-volume collection.

Okay, it was yesterday.

The Importance of Place – an anecdote

Tuesday, January 22nd, 2008 by Karen

John Hagel recently blogged about the increasing value of place in shaping talent development and how the density of similar types of companies “increases opportunities for serendipitous encounters and sustained and rich collaboration.” He refers to Steve Lohr’s article in the December 20, 2007 issue of the New York Times, Silicon Valley Shaped by Technology and Traffic, showing how Silicon Valley consists of multiple “microclusters” grouped by type of tech company.

I particularly enjoyed the article’s interactive map. (You can see our previous location: we were in the same building as LinkedIn, the black internet dot to the right of the Internet leader circle representing Google below the Palo Alto label.)

We have a good example of the importance of location and the opportunity it offers for “serendipitous encounters”. I recall I spent most of my time during the International Conference on Chinese Computing in San Francisco in 1985 arguing with Joe Becker and Lee Collins (at the time both at Xerox, before Lee moved to Apple) that we needed just one unified Han character set to cover Chinese, Japanese, and Korean scripts. RLG had created just such a unified set for our implementation of CJK scripts in RLIN in 1983. RLG gave Lee/Apple a copy of the RLIN CJK Thesaurus with the character set mappings we had used that Lee incorporated into his analysis to demonstrate the value of Han unification and to create the first Unicode encodings.

Our contribution is acknowledged in the Unicode History by Mark Davis, the first Unicode Consortium president:

Concerning unification, when we looked at the unification of CJK ideographs, we had the successful example of the Research Libraries Group’s East Asian Character (EACC) bibliographic code to show the way.

RLG hosted some of the first meetings of what became the Unicode Consortium, of which RLG was one of the founding members. The first meeting was in the RLG annex in Palo Alto. We represented the library community, and eventually ISO TC46/SC4 (the International Standards Organization Subcommittee for technical standards to facilitate interoperability of “information services” including libraries) agreed that it no longer had to maintain a separate set of character set encodings for bibliographic data processing.

Unicode is now widely employed in library local systems as well as almost everywhere else. I can’t imagine that the merging of library requirements within Unicode would have happened if RLG had been located anywhere else…

A new work flow economy in resource sharing

Friday, January 18th, 2008 by Dennis

In November I surveyed the participants in SHARES, a resource sharing consortium for about 90 RLG Programs partner institutions, to find out how they are coping with the pressures of being expected to provide more and better services with the same amount of staff, or in some cases with staff reductions.

My survey was the epitomy of quick-and-dirty, with four simple questions:
Name three tasks your ILL staff performs now that you didn’t perform two years ago;
Name three tasks your ILL staff performed two years ago that you have stopped doing;
Name three tasks that you hope to stop doing but haven’t figured out how to do without;
Add any comments about pressures to increase efficiency so you can deliver more service with the same amount of staff.

I received 19 responses, a pretty small sampling by any measure, but enough to be quite interesting. Seven responses came from art museums, six from academic libraries, three from law, two from medical, and one from a national library.

These responses listed 58 new tasks, 38 tasks stopped, 32 tasks that respondents would like to stop, and 24 comments.

The most often mentioned “new” tasks:
Purchase instead of ILL (7)
Fill requests by sending pdf’s (6)
E-delivery to own patrons (4)
Doc delivery to own faculty (4)
Reminders to patrons (3)

The most often mentioned “stopped” tasks:
Keeping paper files (3)
Looking for free lenders (2)
Monitoring/enforcing patron limits (2)
Using ISO ILL (2)
Using ILL Manager (2)

Most often mentioned “want to stop but can’t” tasks:
Keeping paper files (4)
Using Ariel (3)
Packaging/shipping (2)
Searching for requests that fall through cracks (2)

Can you guess what appeared on all three lists – new task, stopped task, and want-to-stop task?

If you guessed using Ariel, the document supply software created by RLG in 1993 and sold to Infotrieve in 2002, you’re correct.

There were a number requests for specific enhancements to certain resource sharing systems and products. These will be passed along to the appropriate product managers.Licensed e-resources were a common theme. Some reported the difficulty in getting patrons to realize a library owns a title electronically. A number reported having to print out electronic resources and scanning them if they want to lend them. Others insisted that requests for copies is going down because patrons are indeed finding and using e-resources.

Technology was mentioned much more as a possible solution that was organization. Only two respondents listed a change in organization as having an effect on performance: one reported combining circulation, reserves and ILL and assigning ILL processing tasks across the department throughout the day and night; one reported decentralizing processing so branch units could scan, send, and update themselves.

Other recurring issues and themes:
Decided trend toward removing barriers for patrons
Decided trend toward offering more services for own patrons, and for free
Museums alone reported being understaffed, often with a one-person ILL unit with additional duties besides interlending

UK library faced the most red tape:
Copyright law requiring signatures on paper; administrators requiring pre-payment from patrons; such long processing time for invoices that staff waste much of their time explaining and apologizing to creditors and partners.

Good sense, great service:
One library Googles article titles their own patrons are ordering to see if they’re available online for free.
One library automatically turns a failed patron-initiated stack search into an ILL request
One library reports daily to collection development staff what can’t be borrowed via ILL
One institution allows patrons to return ILL’s to any one of 19 libraries.

Most melancholy response:
One library has had to give up sending Christmas cards.

Most ironic response:
One library wished for an alternative to invoicing libraries who were deflected but persisted in sending a request via mail, phone, or fax.

Most amusing response:
One person reported as her favorite “”would like to stop but can’t do without yet choice: “Working outside the home.”

Ah, even in the new work flow economy, the human element reveals itself and remains quite recognizable from older economies: our generous impulses, our contradictions, our fondest wishes.

Google, up and down

Friday, January 18th, 2008 by Merrilee

A somewhat dated report from TechCrunch (November) shows the used of various Google services — some up from the previous year, some down. Interesting (to me) that Book Search was up, Scholar was down. Would be interesting to know how searchers got there — were they using the main Google page, or were they using Scholar and Books pages? Who did the searching? Did they find what they were looking for? These and other questions I’ll never know the answers to…

Flickr inaugurates “The Commons” with Library of Congress collections

Thursday, January 17th, 2008 by GĂĽnter

Flickr and the Library of Congress just announced a prototype which will bring 1,500 or 3,000 photographs (depending on whether you believe the Flickr or the LC blog) from two of the most popular LC photo collections to the immensely popular photo sharing website owned by Yahoo. This project inaugurates “The Commons” on Flickr, which has the tagline: “Your opportunity to contribute to describing the world’s public photo collections.”

alexa.jpgThe benefits to LC (and any other cultural heritage institution choosing to participate) seems so obvious that it feels surprising we haven’t seen this announcement earlier: foremost amongst the benefits to my mind, the LC collections will enjoy unprecedented exposure on a website which receives a staggering amount of traffic. (The screenshot above shows the percentage of webtraffic flowing to Flickr [red] and LC [blue] tracked over a 3 year period by Alexa. No further commentary needed). Flickr displays what looks like a rather comprehensive LC record for each photograph, which also includes links back to the collections and the image itself on the LC website. I’d be rather curious about how these records got into Flickr – was a batch-upload mechanism created for this project? As time passes, I hope we’ll also hear from LC about how the referrals from Flickr have impacted the overall traffic on their website!

And it goes without saying that LC will also harness the collective tagging power of the Flickr community to help describe its collection, a feature of the project much touted on both blog announcements. However, I was interested to learn in the project FAQ that LC is actually hedging its bets on incorporating any of the captured tags in their own system.

The announcement reminded me of a number of other creative projects which have found ways to disclose cultural heritage materials into social networking sites or large online hubs. What comes to mind spontaneously:

  • Last summer, the University of Washington published a fascinating article in D-Lib about their experience in adding links to UW special collections to Wikipedia, which includes statistics on how this strategy increased web traffic to those collections.
  • Just in November, the Brooklyn Museum launched a Facebook application called “ArtShare” which allows users to pick their favorite images from the museum’s collection, and have them shown in rotation on their Facebook profile page. The app itself is social (meaning shareable) as well – the Victoria & Albert and the PowerHouse Museum offer up images to prettify your Facebook page as well. Along with the recently released WorldCat Facebook app (created by my RLG Programs colleague Bruce Washburn), ArtShare is about the only thing happening on my Facebook profile (you notice how I’m casually omitting a link here).

If you can think of other innovative ways in which cultural heritage organization do or should disclose their collections on social networking sites, please drop me a line!

Happy birthday TCP/IP

Wednesday, January 16th, 2008 by Merrilee

TCP/IP is frequently held up as a model for the type of lightweight standard that libraries and affiliated communities should aim to develop. But what goes into developing a “lightweight” standard? And how long does it take?

I heard a brief interview on NPR earlier this month that marked the 25th anniversary of TCP/IP. The interview was with Vinton “Vin” Cerf, one of the two creators of TCP/IP (the other was Robert Kahn). I was amazed to hear that this lightweight standard took 10 years to develop — the first five years were spent defining requirements, and the second five years were spent implementing on a variety of systems.

This interview got me thinking about what goes into lightweight standards development. Part of it might be “less than perfect, good enough.” TCP/IP makes a “best effort” for delivery — the standard does not guarantee delivery of packets (although it works pretty well, there are some very small number of failures). Another lightweight component is scope — the standard is scoped to do a small thing, not everything. Yet another aspect is design. Something small must be well designed, which takes time.

Hopefully as we are looking towards new standards being designed (or towards the retooling and redesign of existing standards), it will not take as long as 10 years. After all, we now have TCP/IP to speed our work along! But good standards specification and development does take time.

Vin Cerf is currently Vice President and “Chief Internet Evangelist” at Google.

RLG Programs: updated

Wednesday, January 16th, 2008 by Merrilee

On Saturday, RLG Program Officers were joined by close to 80 people for the RLG Programs Update at ALA. I counted several library directory and a bevy (flock? gaggle? what is the proper term!) of associate directors attending. We gave updates on 13 projects that we have either recently completed or on which we have made significant progress. It was a lot of seat time for attendees, but we had a lot to share. We’ll try to improve our format going forward. To those of you who attended, thanks for coming. For those of you who couldn’t make it, we hope to see you in June — either Anaheim (ALA) or in Philadelphia for our Annual Meeting. Hopefully we’ll see you in both places!

If you attended the meeting and did not get a handout (which has links to more information for each project), let me know. If you didn’t attend and want a copy of the handout, I will also be happy to send it along.

2008:The Year of Non-Latin References in LC/NACO Authority Records

Tuesday, January 15th, 2008 by Karen

We’ve seen lots of forecasts for 2008, the Year of the Rat. Here’s mine, which I have lots of confidence will indeed happen: We will see Arabic-, Chinese-, Cyrillic-, Greek-, Hebrew-, Japanese-, and Korean-script references in LC/NACO Authority Files!

I already blogged about the agreement among the Library of Congress, the British Library, National Library of Medicine, and OCLC – the major authority record exchange partners — in consultation with the Library and Archives and Canada to add references with non-Latin characters to the name authority records that make up the LC/NACO Authority File. What will really move this to fruition is the Programs and Research project to upgrade the LC/NACO Authority File with non-Latin references derived from the non-Latin bibliographic heading fields in WorldCat, using the same data-mining techniques developed for WorldCat Identities which already includes non-Latin script “alternate names”.

This will allow our users, for the first time, to look up Arabic-, Chinese-, Cyrillic-, Greek-, Hebrew-, Japanese-, and Korean-script names in those scripts without knowing their romanizations and to correctly identify authors who write in those scripts.

This also addresses one of the recommendations in the Report of the Library of Congress Future of Bibliographic Control Working Group released on January 9, 2008: 1.3.3 to “internationalize authority files”. Adding non-Latin scripts to existing headings is a first step to link names that differ according to language and geography but represent the same entity.

To get an idea of what these added references will look like, take a look at the “alternate names” listed in the WorldCat Identities pages for Sun Yat-sen or Menachem Begin. By harvesting non-Latin heading forms that correspond to entities in the authority file, we are reaping the benefits from the significant intellectual work of the many libraries that have provided non-Latin headings on bibliographic records for over two decades. We expect to add more than 500,000 non-Latin references to name authority records, a significant re-use of existing metadata in new contexts.

All NACO contributors will have the opportunity to review and verify the non-Latin script forms as part of their normal workflow. Catalogers will be better able to reflect on past practices related to non-Latin headings, and be in a better position to recommend future best practices for the LC/NACO Authority file. The Library of Congress has issued a White Paper: Issues Related to Non-Latin Characters in Name Authority Records for comment on the issues to be addressed during a six month period following this automatic pre-population of the authority file.

RLG partners expressed considerable excitement about this project during our discussions during mid-winter ALA in Philadelphia. For a number of us who have been waiting to see non-Latin script references in authority files for a quarter-century (or more!) 2008 will also be the Year Our Waiting Is Over.

May I speak Openly about mass digitization?

Friday, January 4th, 2008 by Ricky

We all agree that Open is good (even if we may not agree about what Open means) at the same time, we can all see reasons why corporations have a hard time fitting Openness into their business plans. I think it’s the responsibility of non-profit, cultural heritage institutions to find ways to bridge that gap and work with the corporate world toward a public good.

Since the RLG project regarding public/private mass digitization partnerships that resulted in the publication of the Good Terms report, there have been many encouraging developments on the mass digitization front.

There was a panel discussion at the November 2007 DLF Forum (for notes from the session, scroll down to the Session 2 “presentation”) that talked about public/private digitization partnerships. There were representatives of partnerships with Google, with Microsoft and OCA, with iArchives/, and with Kirtas and Amazon. Now that many of the agreements have been made public, lifting to some extent the shroud of secrecy, we seem to be back in the mode of sharing and working as a community toward common goals. The panel members openly assessed what was good and not-so-good about the agreements and procedures involved in their arrangements. Those who follow will hopefully be able to craft better arrangements.

More and more libraries are getting involved in an increasing array of arrangements to digitize books and other materials in their collections. Many of them are getting involved in multiple arrangements, e.g., Google partners also becoming OCA contributors. European libraries are also finding a variety of ways to ensure their collections are in the flow. There have been a few noble declarations of self-funding (e.g., Emory’s Kirtas initiative and the Boston Library Consortium’s OCA activity), thereby avoiding restrictions issues altogether. Follow the progress through our resource page.

A group of moving image archivists and experts met to prepare themselves in anticipation of interest on the part of potential mass digitization partners. They took the important first step of being clear about their own objective (broader access), before they considered how their interests might intertwine with those of a private partner. They agreed that it was better to have a community voice than to act independently, which often leads to competitive, rather than collaborative, initiatives. They’ve since secured funding to complete an inventory of footage in cultural heritage institutions and they formed a group to flesh out a set of agreed principles.

Exemplary in approach is the (US) National Archives. NARA has released their Plan for Digitizing Archival Materials for Public Access. And, best of all, they sought public comment. Granted, they entered into a number of partnerships prior to creating this document, but now they’ve got it to guide future initiatives – and the public can feel confident that they are acting responsibly. Their principles for digitization (in Appendix A) are especially worth a look.

Keepers of other types of special collections should be preparing similarly. There is already interest in scanning rare books. The Sloan Foundation has funded rare book scanning at LC, Boston Public Library, the Bancroft Library, and others. Since the release of our report, Shifting Gears [pdf], which addresses moving toward mass for non-book collections, several funders have contacted us to talk about ways to prevent special collections from becoming jetsam in the sea of digital books. So, while I think that eventually private companies will get interested in our rare artifacts and visual and audio treasures, the funders may come first. Let’s be ready to impress them with plans for how we can increase the scale at which we can make these valuable materials more accessible.

Paul Courant’s thoughtful November blog postings regarding the University of Michigan’s Google partnership are helping to focus the debate. Google-bashing gets us nowhere and we have to acknowledge that there are some things that used to belong to libraries that now belong to Google (say, for instance, search). Is it fear that the books will also belong to Google (and that they will be evil) that makes us panic? I think it’s better to ask, How can we best work with private partners, while protecting the rights and desires of the people we serve? Let’s acknowledge what they do well and what we do well; we both have a lot to contribute (I hear Google has recently cottoned on to the role of metadata in powerful searching). I hope Paul’s blog will encourage us to focus on topics where we can make a difference, like the quality of the digitized books, preservation of the files – and of the original books, rights issues regarding orphaned works, and useful metadata – as well as how to help people find needles in the haystack.

But regardless of what happens along the way, what matters is the end result – and for that reason, I find myself repeating this mantra: No matter what compromises we may make in finding ways to work with private partners, we must ensure that at some (hopefully not too distant) point in time, all restrictions will be lifted and the content will be openly accessible (limited only by rights inherent in the content itself).

That’s a long mantra, here’s an abbreviated one: Ensure that the content will be open.

Mobilizing collections: from storehouse to scanning factory

Wednesday, January 2nd, 2008 by Roy

NRLFBefore the holidays Ricky Erway, Constance Malpas, Dennis Massie, and Roy Tennant visited the Northern Regional Library Facility (NLRF), which serves as the storage facility for five northern University of California campuses. Besides learning about the storage facility operations, which are interesting in their own right, we also saw the scanning operation of the Open Content Alliance.

nrlf7.jpgFirst some numbers. Over 5 million items are stored here, in a few warehouse-style buildings next to the 580 freeway. Materials are sorted by size and shelved in order of receipt, two-deep to maximize space. The nimble hands at the NRLF process a quarter million volumes each year, most of them titles transferred from the UC Berkeley campus. Approximately 2,000 requests for material housed at the NRLF are filled per week, including both on-site use and remote lending. Most remote usage requests come from patrons at one of the UC libraries, with materials delivered to the requesting library within a day or two.

Interlibrary loan requests that originate from outside the UC system are funneled through the owning UC library, which then requests the materials from the NRLF, as there is nothing in the WorldCat record for a stored item to indicate that it is in the NRLF and not at a northern UC campus. Requested materials are then sent from the storage facility to the depositing lending UC library for processing and forwarding on to the non-UC borrowing library. NRLF staff mentioned that if an OCLC lender could refer incoming requests to another lending symbol, it would allow them to ship ILL materials directly from NRLF (which has its own OCLC symbol) to the non-UC borrowing library. Approximately 25% of the stored materials are special collections, mostly manuscripts and archives, placed on special shelving units that can accommodate archival boxes. Usage of the entire stored collection has leveled off in recent years, though some topical collections (history of science titles, for example) continue to see a relatively high retrieval rate.

NRLF staff say it is too early to tell whether mass digitization projects such as the Open Content Alliance will lead to increased use of what has long been considered low-use materials. (At recent conferences, staff from the University of Michigan and from Duke University have cited anecdotal evidence of increased print circulation of digitized titles.) Selecting, pulling and delivering the materials to be scanned is a monstrously laborious task that requires stamina, perseverance, and precise record keeping. We were all impressed to see how well the NRLF staff have adapted to the new logistical requirements of mass digitization efforts, which demand that large volumes of materials be moved (and tracked) efficiently from point to point — mobilizing collections and staff in ways that are new to traditional library storage operations. As Lizanne Payne noted in her recent OCLC white paper, most such facilities are optimized for efficient storage, rather than efficient retrieval. The changing demands on off-site library collections produced by mass digitization efforts and direct-to-storage acquisitions are creating a host of interesting new challenges for facility managers.

Each week, more than a thousand volumes are ‘shipped’ from the NRLF shelves to an adjacent OCA conversion center. OCA Scanning OperationThe OCA scanning operation has been set up in a small room in one of the buildings that is just large enough to hold ten “Scribe” scanning stations and a check in/out area. Each station can do 4-500 pages per hour, and given two shifts at about 7 scanning hours per shift, about 7,000 pages per day per scribe or 70,000 pages per day for the entire facility. If the average book is 350 pages, this equates to roughly 1,000 books per week. This equates to just over 50,000 books a year at this facility. Another way to look at it is if OCA set out to scan all of the materials at this facility (it hasn’t), it would take about 100 years to accomplish it at the present rate of scanning.

However, each book scanned is another that becomes available on the web for anyone to see and use, and that is no small thing. While we were interested in the Google operation, staff were not at liberty to show or tell. – Dennis, Ricky, Roy and Constance