Archive for the 'Renovating Descriptive Practice' Category

Analyzing MARC tags and projecting MARC’s future

Saturday, March 20th, 2010 by Karen

The RLG Partners working group that has been gathering and analyzing evidence over the past two years about MARC tag usage to inform library metadata practices completed its work. The 72-page Implications of MARC Tag Usage on Library Metadata Practices report was published on March 12 — with links to thirteen detailed data tables for those who love to immerse themselves in statistics. They’re spreadsheets, so you can also filter and sort the data as you like.

The working group’s studies focused on machine applications. This is an important user category that has generally been ignored in user studies.  MARC data is also used for machine matching and manipulations, linking, harvesting, collection analysis, ranking, and providing systematic views of publications. If we envision a future of linked data so that all the work information professionals have invested into creating and maintaining legacy MARC data are available to the rest of the information universe, machine applications will become increasingly important. Future encoding schemas will need to have a robust MARC crosswalk to ingest our millions of legacy records.

We believe that MARC data cannot continue to exist in its own discrete environment. It will need to be leveraged and used in other domains to reach users in their own networked environments. With the increase of digitized full text from various mass digitization efforts, we advise MARC practitioners to focus on authorized names, classifications, identifiers, and controlled vocabularies that key-word searching of full-text will not provide, rather than on “descriptive metadata”.

The working group held a Webinar on March 18, 2010 to discuss its findings and projections for MARC’s future with those interested. I was grateful that Catherine Argus at the National Library of Australia was willing to get up extra early to present her work, at 7:00 am local time, so that RLG Partner staff on the east coast of the US could join the discussion at 4:00 pm EDT. A couple of Catherine’s colleagues at the NLA also listened in. Lisa Rowlison de Ortiz (University of California, Berkeley), who collaborated on the executive summary which pulled together all our work and presented the working group’s views on MARC’s future summarized above, also joined the discussion. The recording of that Webinar will be available on the OCLC Research’s Webinars page soon.

The working group members each selected a topic to research, and then wrote a report summarizing the findings, which we presented during the Webinar:

Read the rest of this entry »

OCLC Research @ University of Calgary

Tuesday, February 16th, 2010 by GĂĽnter

As those of you who have listened to Tom Hickerson’s Distinguished Seminar Series lecture will know, the University of Calgary has embarked on an ambitious plan of integrating their libraries, archives and museums under a single administrative umbrella (Libraries & Cultural Resources or LCR). This convergence is catalyzed by a new building in the heart of the university’s campus, which will co-locate the units as well as many campus research, teaching & learning support functions. In latest news, last week a reorganization of LCR was announced to realign the staff with emerging priorities. The University of Calgary is our latest addition to the roster of institutions participating in the RLG Partnership, and to make proper mutual introductions, a team from OCLC Research visited Calgary last week.

In conversations preparing for our trip, we were asked to make a contribution in moving LAM integration at the university forward, and in particular, to focus on Calgary’s ambitions to create a single search across LCR resources. (Calgary currently experiments with Summon for single search - watch an introduction here). Our agenda (inspired by our LAM workshops) called for a broad discussion establishing key features for single search, followed by sessions focused on how archives, metadata services/libraries and museums can contribute to these features and the overarching goal of single search. You’ll find the presentation we used to set the scene for the single search discussions here - it also contains a number of examples from other institutions who have ventured down this path, including the Victoria & Albert, Yale & the Smithsonian.
Read the rest of this entry »

The Cult of Brewster Finds Its Church

Tuesday, October 20th, 2009 by Roy

The Internet Archive's New HomeLast night Brewster Kahle of the Internet Archive unveiled his latest project in a venue suitable for any high priest or cult leader — a former Christian Science Church in San Francisco. As it turns out, the Internet Archive recently purchased the building, and as Brewster remarked during the grand unveiling of the Bookserver project, it even matches their long-time logo, which was selected on purpose to imply a physical library.

Although the mood in the great room of the church that eventually Brewster hopes to turn into a modern-day library reading room was more hallelujah-inspiring than anything, the day preceding had been more down-and-dirty technical. The two-day meeting (still going on as I write this), is more about AtomPub and identifiers than holy water and consecrated wafers, but all of it does take a certain amount of faith. Read the rest of this entry »

Crowdsourcing Lessons

Monday, September 14th, 2009 by Roy

The Library of Congress, the Smithsonian Institution, more RLG Partners and others have participated in the Flickr Commons, all to try to leverage what’s become known as “crowdsourcing” — “the act of taking tasks traditionally performed by an employee or contractor, and outsourcing it to an undefined, generally large group of people or community in the form of an open call,” as Wikipedia describes it. By posting content on the web in places where many people frequent, the Library of Congress and others are hoping to attract descriptions, subject labels, and other useful content to enrich their finding tools. And this has undeniably led to enriched descriptions.

But tossing something out on the “interwebs” and creating an effective crowdsourcing environment are two very different things. And this article, from the Nieman Journalism Lab, describes lessons from the Guardian newspaper in the UK that recently used crowdsourcing in their amazing unveiling of the British Parliament expenses scandal. The “four lessons” they point out include:

  1. “Your workers are unpaid, so make it fun.” Make it feel like a game, even if it seems like work to you.
  2. Public attention is fickle, so launch immediately.” If it is newsworthy, in other words, strike while the iron is hot.
  3. “Speed is mandatory, so use a framework.” Again, applies if something is newsworthy and has a limited span of time to attract attention. Luckily, there are fast ways you can get going with a site.
  4. “Participation will come in one big burst, so have servers ready.” Also important for when you have a short but intense focus of attention. The Guardian used Amazon’s EC2 infrastructure, for which during the brief span of their project they figure they spent somewhere under 60 pounds. Right, chump change.

Although these tips are definitely skewed toward a crowdsourcing opportunity tied to a newsworthy situation (and therefore of a short-lived attention span), libraries, museums, and archives are not immune from such events. Therefore, it would be good for us to be ready to exploit such opportunities when they arise. For example, what about the 100th anniversary of an author’s birth? That’s a newsworthy event, were an archive chock-full of that author’s content and papers able to exploit the crowd in some useful way. Just sayin’.

Note: Thanks to Rose Holley, of the Australian Newspapers Project and a member of our RLG Partnership Social Metadata Working Group, for pointing this out.

Context for Metasearch

Friday, August 28th, 2009 by Jennifer

Last Friday the Encoded Archival Context (EAC) standard for archival authorities was released to the international community for review. Warning: an EAC record is not your grandmother’s MARC authority record. EAC is a companion standard to Encoded Archival Description (EAD), yet now seems to be useful well beyond the world of archives.

Managing collections archivally requires archivists to create comprehensive descriptions of corporate bodies, persons and families. Who would know better the context of records and creators than the archivists with the stuff in their hands? And who knew that this contextual information would be exactly what folks want to share when Networking Names [pdf]? With EAC we can link the creators, the context and the stuff. EAC goes one step further, facilitating the exchange of authoritative contextual information across many domains.

It turns out EAC is useful infrastructure for metasearch. At our RLG Annual Meeting, Warwick Cathrow demonstrated The National Library of Australia’s prototype “one-search” service. Here one can discover everything - pictures, books, archives, newspaper articles, music, etc. - by and about a creator. The Australians have used EAC to collate dispersed, silo-ed information. (Just search the Christian name “Nellie” and watch it go! Hats off to Basil Dewhurst and his team.) Read the rest of this entry »

VIAF stats and improved matching

Thursday, August 13th, 2009 by Karen

The Virtual International Authority File continues to both grow and improve. In July the sixteen source files together had 10,759, 910 usable name records, and 70.31% had related bibliographic records for matching. 30.33% of the name records matched at least one other source. Compare to April, where nine source files had a 28.36% match rate.

It’s human review that shows where the matching algorithms need tweaking. I had spotted that source records for Laozi were not matching up.  My colleagues Thom Hickey and Jenny Toves identified the problem and fixed it. The number of headings retrieved for Laozi was reduced from 108 to 3. Jenny provided these before and after screen shots, an indication of improved matching for others like it.

Before (screen shot taken 2009-07-01) - click to enlarge image

After (screen shot taken 2009-08-10) - click to enlarge image

Viva la VIAF! Encore

Thursday, June 25th, 2009 by Karen

The Virtual International Authority File at viaf.org now contains personal names from sixteen different authority files! When I last blogged about the file last April, there were only four: Library of Congress, the Bibliothèque nationale de France, the Deutsche Nationalbibliothek, and the National Library of Sweden. Names depend on context, and VIAF is providing a great view of what each form is within a given national context.

The additions (some are test files):

Bibliotheca Alexandrina (Egypt)
Biblioteca Nacional de Portugal
Biblioteca Nacional de España
National Library of Australia (an RLG Partner)
National Library of the Czech Republic
National Library of Israel (four files, one each of Arabic, Cyrillic, Hebrew, and Latin characters)
Istituto Centrale per il Catalogo Unico (Italy)
Swiss National Library (an RLG Partner)
Vatican Library

We’re getting more crystalline structures that show the matching among the files. The image below for Spinoza shows the mapping among the preferred forms of name from ten different files. Try it out!

Click to see the full-sized image.

Trucking

Thursday, June 25th, 2009 by Jennifer

The recording of “Treasures on Trucks and Other Taboos: Rethinking the Sharing of Special Collections” is now online in .wmv format (147MB/131min.) .mp4 format (178MB/131min.) and in the iTunes Store. This web seminar is the first conversation in the new project about Sharing Special Collections. You can expect to hear more on this project from Dennis. Keep on trucking (and scanning) your distinctive materials, and/or keep on talking about it.

Information architecture and music

Tuesday, May 12th, 2009 by Jim

Two former RLG staff members (and two of my favorite, really interesting people) recently met up in their current professional roles. Dylan Tweney, former RLG writer, now senior editor at Wired.com, and keynote speaker at our 2007 RLG Partners meeting interviewed Zoe Keating about her music and creative process. Zoe Keating

Zoe is a fantastic cello player producing innovative music (and getting to play with other equally terrific musical talents). While at RLG she was the information architect for our RedLightGreen service. In this video interview she says

“My music is the fusion of information architecture and classical music,” Keating says in this Wired.com video. “The way that you problem-solve in the world of technology … really lends itself to problem-solving with the kind of music that I do.”

Watch the interview, check out the performance video, and put both Dylan and Zoe into your feeds.

P.S. Some day I’m going to find those screenshots of RedLightGreen ;)

Networking names

Friday, May 1st, 2009 by Karen

Our Networking Names report has just been published! I was pleased to see this morning a number of tweets announcing it – or echoing other tweets.

I blogged last November about names touching everything soon after the Networking Names Advisory Group met together at the Met. The fifteen members of the advisory group have spent the time since refining fourteen use case scenarios, those that they were most knowledgeable about - academic libraries and scholars, archivists and archival users, and institutional repositories. These use case scenarios envisioned how different communities could benefit from aggregating information about persons and organizations, corporate and government bodies, and families, and making it available on a network level.  From the use case scenarios we derived the functions and attributes of what would be needed for a “cooperative Identities Hub”.

Some of the components of a cooperative Identities Hub exist or are being developed. We wanted to articulate the characteristics of a gateway to all forms of names authorized or used in other contexts without preferring one form of name over another and that would use social networking to tap expertise in all communities. We envisioned a switch for users or their machine applications to extract relevant information for re-use in their own contexts and enable contributions from different sources.  These are objectives we can all strive towards.

We’re looking at ways to amplify this work. Feel free to post your comments or reactions here in the meantime.

I am deeply grateful to all the RLG Partner staff who contributed to the report – a very talented group to work with: Grace Agnew (Rutgers), Laura Akerman (Emory), Genevieve Clavel (Swiss National Library), Joan Cobb (Getty Research Institute), Michele Crump (U. Florida), Amanda Hill (U. Manchester/UK Names Project), Deborah Kempe (Frick), Amy Lucker (New York University), Dennis Meissner (Minnesota Historical Society), Suzanne Pilsk (Smithsonian), Michael Rush (Yale), Jon Shaw (U. Pennsylvania), Laura Smart (CalTech), Daniel Starr (Metropolitan Museum of Art), Bob Wolven (Columbia).