Archive for the 'Renovating Descriptive Practice' Category

The Cult of Brewster Finds Its Church

Tuesday, October 20th, 2009 by Roy

The Internet Archive's New HomeLast night Brewster Kahle of the Internet Archive unveiled his latest project in a venue suitable for any high priest or cult leader — a former Christian Science Church in San Francisco. As it turns out, the Internet Archive recently purchased the building, and as Brewster remarked during the grand unveiling of the Bookserver project, it even matches their long-time logo, which was selected on purpose to imply a physical library.

Although the mood in the great room of the church that eventually Brewster hopes to turn into a modern-day library reading room was more hallelujah-inspiring than anything, the day preceding had been more down-and-dirty technical. The two-day meeting (still going on as I write this), is more about AtomPub and identifiers than holy water and consecrated wafers, but all of it does take a certain amount of faith. Read the rest of this entry »

Crowdsourcing Lessons

Monday, September 14th, 2009 by Roy

The Library of Congress, the Smithsonian Institution, more RLG Partners and others have participated in the Flickr Commons, all to try to leverage what’s become known as “crowdsourcing” — “the act of taking tasks traditionally performed by an employee or contractor, and outsourcing it to an undefined, generally large group of people or community in the form of an open call,” as Wikipedia describes it. By posting content on the web in places where many people frequent, the Library of Congress and others are hoping to attract descriptions, subject labels, and other useful content to enrich their finding tools. And this has undeniably led to enriched descriptions.

But tossing something out on the “interwebs” and creating an effective crowdsourcing environment are two very different things. And this article, from the Nieman Journalism Lab, describes lessons from the Guardian newspaper in the UK that recently used crowdsourcing in their amazing unveiling of the British Parliament expenses scandal. The “four lessons” they point out include:

  1. “Your workers are unpaid, so make it fun.” Make it feel like a game, even if it seems like work to you.
  2. Public attention is fickle, so launch immediately.” If it is newsworthy, in other words, strike while the iron is hot.
  3. “Speed is mandatory, so use a framework.” Again, applies if something is newsworthy and has a limited span of time to attract attention. Luckily, there are fast ways you can get going with a site.
  4. “Participation will come in one big burst, so have servers ready.” Also important for when you have a short but intense focus of attention. The Guardian used Amazon’s EC2 infrastructure, for which during the brief span of their project they figure they spent somewhere under 60 pounds. Right, chump change.

Although these tips are definitely skewed toward a crowdsourcing opportunity tied to a newsworthy situation (and therefore of a short-lived attention span), libraries, museums, and archives are not immune from such events. Therefore, it would be good for us to be ready to exploit such opportunities when they arise. For example, what about the 100th anniversary of an author’s birth? That’s a newsworthy event, were an archive chock-full of that author’s content and papers able to exploit the crowd in some useful way. Just sayin’.

Note: Thanks to Rose Holley, of the Australian Newspapers Project and a member of our RLG Partnership Social Metadata Working Group, for pointing this out.

Context for Metasearch

Friday, August 28th, 2009 by Jennifer

Last Friday the Encoded Archival Context (EAC) standard for archival authorities was released to the international community for review. Warning: an EAC record is not your grandmother’s MARC authority record. EAC is a companion standard to Encoded Archival Description (EAD), yet now seems to be useful well beyond the world of archives.

Managing collections archivally requires archivists to create comprehensive descriptions of corporate bodies, persons and families. Who would know better the context of records and creators than the archivists with the stuff in their hands? And who knew that this contextual information would be exactly what folks want to share when Networking Names [pdf]? With EAC we can link the creators, the context and the stuff. EAC goes one step further, facilitating the exchange of authoritative contextual information across many domains.

It turns out EAC is useful infrastructure for metasearch. At our RLG Annual Meeting, Warwick Cathrow demonstrated The National Library of Australia’s prototype “one-search” service. Here one can discover everything - pictures, books, archives, newspaper articles, music, etc. - by and about a creator. The Australians have used EAC to collate dispersed, silo-ed information. (Just search the Christian name “Nellie” and watch it go! Hats off to Basil Dewhurst and his team.) Read the rest of this entry »

VIAF stats and improved matching

Thursday, August 13th, 2009 by Karen

The Virtual International Authority File continues to both grow and improve. In July the sixteen source files together had 10,759, 910 usable name records, and 70.31% had related bibliographic records for matching. 30.33% of the name records matched at least one other source. Compare to April, where nine source files had a 28.36% match rate.

It’s human review that shows where the matching algorithms need tweaking. I had spotted that source records for Laozi were not matching up.  My colleagues Thom Hickey and Jenny Toves identified the problem and fixed it. The number of headings retrieved for Laozi was reduced from 108 to 3. Jenny provided these before and after screen shots, an indication of improved matching for others like it.

Before (screen shot taken 2009-07-01) - click to enlarge image

After (screen shot taken 2009-08-10) - click to enlarge image

Viva la VIAF! Encore

Thursday, June 25th, 2009 by Karen

The Virtual International Authority File at viaf.org now contains personal names from sixteen different authority files! When I last blogged about the file last April, there were only four: Library of Congress, the Bibliothèque nationale de France, the Deutsche Nationalbibliothek, and the National Library of Sweden. Names depend on context, and VIAF is providing a great view of what each form is within a given national context.

The additions (some are test files):

Bibliotheca Alexandrina (Egypt)
Biblioteca Nacional de Portugal
Biblioteca Nacional de España
National Library of Australia (an RLG Partner)
National Library of the Czech Republic
National Library of Israel (four files, one each of Arabic, Cyrillic, Hebrew, and Latin characters)
Istituto Centrale per il Catalogo Unico (Italy)
Swiss National Library (an RLG Partner)
Vatican Library

We’re getting more crystalline structures that show the matching among the files. The image below for Spinoza shows the mapping among the preferred forms of name from ten different files. Try it out!

Click to see the full-sized image.

Trucking

Thursday, June 25th, 2009 by Jennifer

The recording of “Treasures on Trucks and Other Taboos: Rethinking the Sharing of Special Collections” is now online in .wmv format (147MB/131min.) .mp4 format (178MB/131min.) and in the iTunes Store. This web seminar is the first conversation in the new project about Sharing Special Collections. You can expect to hear more on this project from Dennis. Keep on trucking (and scanning) your distinctive materials, and/or keep on talking about it.

Information architecture and music

Tuesday, May 12th, 2009 by Jim

Two former RLG staff members (and two of my favorite, really interesting people) recently met up in their current professional roles. Dylan Tweney, former RLG writer, now senior editor at Wired.com, and keynote speaker at our 2007 RLG Partners meeting interviewed Zoe Keating about her music and creative process. Zoe Keating

Zoe is a fantastic cello player producing innovative music (and getting to play with other equally terrific musical talents). While at RLG she was the information architect for our RedLightGreen service. In this video interview she says

“My music is the fusion of information architecture and classical music,” Keating says in this Wired.com video. “The way that you problem-solve in the world of technology … really lends itself to problem-solving with the kind of music that I do.”

Watch the interview, check out the performance video, and put both Dylan and Zoe into your feeds.

P.S. Some day I’m going to find those screenshots of RedLightGreen ;)

Networking names

Friday, May 1st, 2009 by Karen

Our Networking Names report has just been published! I was pleased to see this morning a number of tweets announcing it – or echoing other tweets.

I blogged last November about names touching everything soon after the Networking Names Advisory Group met together at the Met. The fifteen members of the advisory group have spent the time since refining fourteen use case scenarios, those that they were most knowledgeable about - academic libraries and scholars, archivists and archival users, and institutional repositories. These use case scenarios envisioned how different communities could benefit from aggregating information about persons and organizations, corporate and government bodies, and families, and making it available on a network level.  From the use case scenarios we derived the functions and attributes of what would be needed for a “cooperative Identities Hub”.

Some of the components of a cooperative Identities Hub exist or are being developed. We wanted to articulate the characteristics of a gateway to all forms of names authorized or used in other contexts without preferring one form of name over another and that would use social networking to tap expertise in all communities. We envisioned a switch for users or their machine applications to extract relevant information for re-use in their own contexts and enable contributions from different sources.  These are objectives we can all strive towards.

We’re looking at ways to amplify this work. Feel free to post your comments or reactions here in the meantime.

I am deeply grateful to all the RLG Partner staff who contributed to the report – a very talented group to work with: Grace Agnew (Rutgers), Laura Akerman (Emory), Genevieve Clavel (Swiss National Library), Joan Cobb (Getty Research Institute), Michele Crump (U. Florida), Amanda Hill (U. Manchester/UK Names Project), Deborah Kempe (Frick), Amy Lucker (New York University), Dennis Meissner (Minnesota Historical Society), Suzanne Pilsk (Smithsonian), Michael Rush (Yale), Jon Shaw (U. Pennsylvania), Laura Smart (CalTech), Daniel Starr (Metropolitan Museum of Art), Bob Wolven (Columbia).

Expert Community Experiment Update

Tuesday, April 21st, 2009 by Roy

Recently, OCLC launched an experiment in making it easier for members to update and correct WorldCat records. Dubbed the Expert Community Experiment, the goal is to engage the community in improving overall database quality. Specifically, members with full-level cataloging authorizations have the ability to improve and upgrade WorldCat master records during the experiment. It began in February and will last six months.

In March, there were 18,910 Expert Community Experiment replaces.  There were 1001 institutions that did at least one replace.  Individual institution numbers ranged from 3 institutions doing more than 500 replaces to 242 institutions doing 1 replace each. Other figures:

Database Enrichment: 18,235
Minimal-Level Upgrade: 14,791
Enhance Regular: 15,052
Enhance National: 3,583
CONSER Authentication: 1,929
CONSER Maintenance: 6,183

To put this into perspective, during the same period OCLC staff replaced 1,086,715 records. This isn’t to say that we couldn’t see substantial improvements in database quality under a less strict editing regime, only that you likely didn’t know just how hard we work to improve the WorldCat database. I sure didn’t, and I work here.

Viva la VIAF!

Monday, April 6th, 2009 by Karen

Try out the enhanced and expanded Virtual International Authority File at viaf.org.. It now contains 7.8 million records built from 9.2 million source authority records from the Library of Congress, the Bibliothèque nationale de France, the Deutsche Nationalbibliothek, and the National Library of Sweden. More files will be added. Thom discusses the recent changes to VIAF in his Outgoing blog:

The VIAF site has recently had a major overhaul.  What you now search are records created from a merge of matching source authority records.  Within this record you can see what source records were used to create it, along with cross references and other information gleaned both from the authority records and from associated bibliographic records.

We all have our favorite “authority control poster children”, as Lorcan calls them.  The example he blogged about is Flann O’Brien.  One of my favorites is Chiang Kai-shek – that is the preferred form in the LC and National Library of Sweden authority files, but it’s listed on top with Jiang, Jieshi, the preferred form in the Bibliothèque nationale de Franc and the Deutsche Nationalbibliothek authority files. It illustrates a difference in perspective: Jiang Jieshi is the Pinyin romanization of the Mandarin pronunciation of the characters in Chiang Kai-shek’s name. One of the beauties of VIAF is that it aggregates the preferred forms used in different sources without itself preferring one form over another.

Click twice to see the full sized images.

And it lists all the alternate forms each of those sources includes, a very long list that also includes several forms in Chinese characters: