Archive for October, 2007

Communities of practice in web archiving

Tuesday, October 30th, 2007 by Merrilee

Long, long ago, on October 18th (before I moved offices or went on vacation!) I attended the first day of the Archive-It partners’ meeting at the Internet Archive. Molly Bragg and Kristine Hanna from the Internet Archive asked me to attend and help facilitate a discussion on “big picture issues” for web archiving.

Now that we have a sizable community of institutions who are doing web archiving, we can start to establish first practices, if not best practices, for communities of web archiving. I think it’s important to first identify what the various business cases or communities of web archiving exist, because I think that practices (descriptive practices, collection development, etc.) will differ between communities of practice.

Here’s my first stab at defining communities of web archiving practice. What would you add? How would you organize differently? I’m eager to hear.

  • Subject-based (area studies, etc.)
  • Mandated collection/retention (here I’d include capturing state and agency publications, government websites / publications, capturing the webspace of a university for university archives…)
  • Institutional repository/intellectual output of an institution
  • Event based (9/11, Katrina, Virginia Tech shootings)
  • Preserving virtual environments (Second Life)

This was a terrific meeting, and I’m glad I had the opportunity to attend.

Dewey crashes

Thursday, October 25th, 2007 by Jim

Imagine if the whole LC system had crashed in on her.

From the NYTimes Sunday magazine

This Is Not A Bob Dylan Movie
Published: October 7, 2007

…”Like Blanchett, Lachman, the cinematographer (who has worked with Robert Altman, Steven Soderbergh and Wim Wenders, among others), quizzed Haynes about his choice of film styles. ”He said that the obvious thing would have been to use the style of D. A. Pennebaker’s ‘Don’t Look Back,’ but if you listen to what Dylan was saying at the time, it wasn’t about being in rooms with bandmembers; he was being Felliniesque with his prose,” Lachman says. ”It’s all this imagery. So what better filmmaker than Fellini? What better film than ’8 1/2,’ which is about a filmmaker being hounded?”

It’s probably not a bad analogy for how Blanchett felt on the set. For one thing, she was negotiating the fact that sometimes she was speaking composed dialogue, other times reciting actual interviews, especially a 1966 interview Dylan did with Nat Hentoff in Playboy. ”That’s why it was so tricky to play that scene, because it is from an interview,” Blanchett says. ”But Dylan’s obviously riffing, finding that stuff in the moment. And it’s the difference between doing that, and also knowing that this is a reference to something that has already been said. So it was very difficult to play because you were constantly aware that you were in the immediacy of the moment but yet referencing primary, tertiary and secondary sources — the whole Dewey system was crashing in on me.”…

Lots of Data on Metadata Practices!

Tuesday, October 23rd, 2007 by Karen

OK, so I was overly optimistic when I predicted that the report on the RLG Descriptive Metadata Practices Survey results would be out in October. I had anticipated that a lot of the data would need to be normalized, but underestimated the challenges in interpreting the results – what they mean, what questions they raise, and what we think we now understand of current descriptive metadata practices and dependencies.

Since I posted my preliminary analysis, we received an additional response, making the total 89 from the 18 RLG Programs partners selected because they had “multiple metadata creation centers” on campus that included libraries, archives, and museums and had some interaction among them. We still have a diverse set of perspectives represented. 40% of the respondents characterized their immediate work environments as digital library production, 37% as archival collections processing, 37% as library technical services, 19% as museum collections, and 16% as institutional repositories.

While we continue to analyze the results, here are a couple of charts to continue to whet your appetite.

What percentage of your collection do you estimate has not been adequately described – and is unlikely to be described without additional resources, funding, or both?


This is sort of a “is the glass half full or half empty?” type question. You could say that more than half of the respondents estimate 30% or less of their collections are not adequately described, and not likely to be, meaning that 70% or more of their collections are adequately described. Or you could say just under half of the respondents estimate that 30% or more of their collections are not adequately described.

It’s more revealing to filter the responses for the “more than 50%” of collections estimated to be inadequately described (22% of all responses in the aggregate) by the immediate work environment the respondents characterized themselves as – archival collections, digital library production, library technical services, or museum collections:

More than 50% of collection inadequately described, and unlikely to be.


So it would appear that efficient metadata tools are most needed by museum collections, and archival collections are almost twice as likely to have less than half of their collection adequately described when compared to libraries. And that libraries as a group have more adequately described their collections than the other groups. Maybe.

Stay tuned for more…

The World’s Libraries. Connected.

Monday, October 22nd, 2007 by Roy

OCLC LogoToday OCLC unveils our new unified and refreshed corporate identity. The three rings of the identity can be seen as representing the connections that OCLC provides in a number of different ways, for example:

  • Connecting libraries at the local, group and global levels
  • Connecting people through libraries to knowledge
  • Connecting past, present, and future through access to library collections

OCLC PICA, with offices in the Netherlands, Australia, France, Germany, Switzerland, the United Kingdom and the United States, will be known as OCLC. By bringing together all offices under one name and identity, libraries worldwide can benefit from OCLC membership, research and an expanded portfolio around a comprehensive set of products and services.

This new identity will be rolled out in various ways over the next few months, but already various products are being distributed with the new logo. At the OCLC CAPCON meeting I was at last week attendees received paper holders emblazoned with the new logo, and all employees now sport the new logo on t-shirts and bags.

But it’s more than just a visual identity. To quote from a booklet that was distributed to employees to explain the new identity, “Brand is more than marketing hype. It is more than a name, more than a logo, more than an advertising campaign. Branding is about aligning what we say with what we do.” I think it’s a great time to launch a new brand identity, as we push to unite the world’s libraries into a global network that will provide an unprecedented opportunity for cooperation and collaboration.

Special Collections – the big agenda

Friday, October 19th, 2007 by Jim

In an earlier post I reported some impressions from attending the ARL Special Collections Working Group (SCWG) meeting. One of the reasons that I was there is the large role that special collections play in the research institutions served by RLG Programs and the consequent emphasis we are giving them in our work agenda.

At the SCWG I passed around early review copies of a thoughtful and intentionally provocative essay written by my Programs colleagues, Ricky Erway and Jen Schaffner. Their premise is that special collections need to be in the networkflow, that scaling up digitization of these materials requires a change in a whole variety of historical emphases and practices and that as a community we need to completely retool our attitudes and approaches to extending access to special collections in the library, archives and museum world. It’s titled Shifting Gears: Gearing Up to Get Into the Flow and was inspired by the interactions and presentations at our Digitization Matters forum at the recent Society of American Archivists meeting.

Church doors, WittenbergThey post their theses in clear language
that calls for action. Here’s one:
“Take a page from archivists. Stop obsessing about items…” Their calls for action need to get turned into community activities that build a new infrastructure, new practices and new expectations. Much that needs to be done is appropriate for RLG Programs to undertake and much more needs to be undertaken by individual institutions and the organizations on which they rely for support. We’re interested in a discussion with our Partners to help us focus a program of work and anxious to engage the broad community in discourse that leads to change. The essay (pdf) is now on our website (the contemporary equivalent of those 16th century church doors) and available for comment, debate, copying, translation and ultimately action.

A good day for open access!

Friday, October 19th, 2007 by Günter

dscf9372.jpgAs you can imagine, with our offices looking like you can see in Roy’s pictures below, it’s an attractive option to spend time elsewhere. On Wednesday, Ricky and I fled to the Officer’s Club in the Presidio of San Francisco to attend the Open Content Alliance annual gathering. There we found a large roomful of people, all fired up about sharing their content according the principles established by the OCA. A truly impressive gathering, with lots of enthusiasm for the work at hand. Ricky, who had also attended the meeting last year, remarked that the community now is taking ownership of this project in a way which she hadn’t quite seen before.

Among the news-items which percolated during the day:

  • The OCA will experiment with scan-on-demand. If I understood correctly, the Internet Archive will offer a “Scan This” button which can be integrated into a local catalog. Once a user hits that button, it’ll take them to a website where they can sponsor the digitization of an item for a cost-recovery fee. The OCA partner library then sends the book in question to the closest OCA scanning facility. The requester presumably gets notified, and the digital text becomes part of the OCA.
  • Brewster also has two new deals for you on digitizing microfilm – you can either send your microfilm to the Internet Archive for digitization at a cost recovery rate, or you can get a free scanner, if you provide the labor (and microfilm) it takes to feed it.
  • It also sounded like Brewster wanted to encourage participants to take the next step in terms of copyright: a pilot-project will start digitizing out-of-print / in copyright works, a departure from the strictly public domain digitization in the OCA to date.

dscf9376.jpgBefore the evening reception, we had a little tour of the Internet Archives offices, just a short jaunt down the street, where Brewster has mocked up a future library-in-a-room as he sees it. At the heart of his vision: the Espresso Book Machine, a Sloan funded prototype of a book printer, binder and dispenser which pops out an OCA digitized 300 page volume in about 6 minutes. Now if it could only make coffee as well…

Image – Top: The Internet Archive office in the Presidio
Image – Bottom: The Espresso Book Machine

And So It Begins…

Wednesday, October 17th, 2007 by Roy

Furniture being prepared to move out the doorThe movers are quickly tearing our office apart today in preparation for moving everything about 17 miles north to San Mateo. Yes, we’ll soon be on the road to new digs significantly closer to the San Francisco International Airport, as well as to the city itself.

mess.jpgAs you can imagine, there is an amazing amount of collected stuff that needs to be sifted through and thrown away, saved, or recycled (see photo).

We’ll post more about the new space later, but for now you should know that with the move and a heavy travel schedule for most RLG Programs staff, this is a crazy time for us. Not that you should stay away, just realize that if we don’t get back to you immediately it isn’t because we don’t like you.

And soon we’ll be happy to welcome you to our new office. We’re so close to the airport that if you’re passing through SFO and have a couple hours to kill, let us know. We might be able to come get you and give you a personal tour of what we’re up to. We won’t even make you help us unpack.

Some things are worth waiting for

Tuesday, October 16th, 2007 by Karen

The Library of Congress’ CPSO (Cataloging Policy and Support Office) announced today:
The major authority record exchange partners (British Library, Library of Congress, National Library of Medicine, and OCLC, Inc., in consultation with Library and Archives Canada) have agreed to a basic outline that will allow for the addition of references with non-Latin characters to name authority records that make up the LC/NACO Authority File. 

These non-Latin script “alternate names” will be added as unlinked references. The established form will still be, for example, Kurosawa, Akira but now users who know the famous movie director as either 黑澤明 or 黒沢明 will be able to use those forms to retrieve all  works by or about Kurosawa.

Some of us have been waiting a long, long time for this. I recall that the original CJK white paper on adding CJK scripts to RLIN also posited adding CJK scripts to authority records at the same time. That paper was written back in 1980!  RLG submitted a MARBI proposal to add “alternate graphic representation to USMARC Authority Records” back in 1991 (proposal 1991-01). It was approved, but no one was in a position to implement it.

OCLC Programs and Research is working with our metadata services colleagues and the CPSO-Unicode group to make this happen. The “Lead an effort to upgrade the LC/NACO Authority file with non-Latin alternate names” is one of the projects in our work agenda under “Leveraging Vocabularies for Effective Discovery”.

Special Collections – the big deal

Sunday, October 14th, 2007 by Jim

I attended the Association of Research Libraries Membership Meeting last week. The Special Collections Working Group of ARL met for nearly a full day and I was pleased to be part of the discussions as an invited liaison.

The Working Group was formed following an earlier effort that resulted in the “Hidden Collections” agenda. At this point the working group is struggling to articulate principles, practices and behaviors that ought to define 21st century special collections. They won’t necessarily embrace a program of work so much as articulate a path forward for those institutions that have and will continue to have ‘special’ collections. I put quotation marks around ‘special’ because there is considerable debate and some confusion about what special collections might mean as the 21st century progresses in a predominantly digital information and publication form.

Dealing with the extant base of special collections – identifying, describing, digitizing and disclosing them – is an enormous challenge and opportunity. In discussion this challenge gets elided with the challenge of managing the flood of digital materials – electronic records, personal research collections, web sites, blogs, and all sorts of digital ephemera – that may represent the sources and materials that we called ‘special’ in the print format world.

One of the most interesting exchanges was precipitated by the observations that “special formats are not special collections” and that “collecting in the 21st century is different than a 21st century collection”. (I believe these came from Mark Dimunation of LC.) This sparked some debate about what we will be collecting and whether our thinking is perhaps shaped too much by trying to extend or extrapolate from our traditional and historic collecting practices. For instance our past practices, while striving to achieve community consistency and collaboration in descriptive efforts, were at the same time characterized by a fundamental dynamic of competition for the scarce, unique or reputedly valuable.

This led to some speculation about what collaborative collecting for special collections might mean in the 21st century. Are there new opportunities to collect as a community and focus institutional competition on the distinctive ways in which researchers are supported in using and making sense of a commonly-held collection of ‘special’ resources?

Cliff Lynch of CNI opined that if we were to behave that way the community ought to come together to collectively support the Internet Archive and, via its support, dictate requirements that ensured the web archives would meet community requirements – we’d collectively own a huge and critical historical resource that is maintained economically and accessible under terms and mechanisms shaped by the needs of future research.

I extended the thought to the blogosphere. There are a relatively small number of blog support providers. Couldn’t we make a comparable deal with them on behalf of the entire research community? This extends the logic and economics of the consortial ‘big deal’ to the special collections arena. It eliminates the enormous investments that we currently make in selection of materials and allows us to redirect those resources to the tools and technology that support researchers in making sense and use of the material.

Inside scoop on collections summit, part two

Saturday, October 13th, 2007 by Dennis

Previously I talked a little about the original impetus behind the RLG Programs print collections summit coming up next month in Philadelphia. Now I’d like to share a bit about why we invited the institutions we did, and what RLG Programs hopes to get out of the event.

Let’s face it: maybe five or six libraries among us will continue to buy nearly everything about everything in which their constituents might have an interest. Those five or six libraries will also keep pretty much every single volume they’ve ever bought. Forever.

Everyone else is going to have to adapt. Everyone else is going to at least think about finding someone to play with when it comes to managing print collections.

For the November meeting, we decided to focus not on the five or six libraries that will be able to keep on doing it all for themselves, but rather on those RLG Programs partner institutions that we know are actively engaged in collaborative collective management activities. That’s who we invited. We’re hoping to draw out the early adapters, as it were. One way to adapt is to start thinking about your own local collections, particularly print collections, as something that fits in with a larger whole. We feel that RLG Programs might be in a unique position to help frame an interesting conversation about that.

No doubt we missed some important work already going on right under our noses. Certainly, other partner institutions could have contributed as much as those that have been invited. Some partners with ongoing work in this area asked after the announcement that they be invited. So our group has grown larger than will fit comfortably around a conference table. Meanwhile, we at RLG Programs have grown smarter about the landscape. We hope that one effect of the conversations about the November summit will be that such projects of yours that we might have missed (and that interested others might also have missed) get talked about and documented much more widely and become an accessible part of our collective experience.

RLG Programs already has a set work agenda in the area of managing the collective collection. This agenda is based in part on our understanding of the opinions, wants and needs of the community. This November meeting is not for us some idealistic blue-sky exercise where we will gather experts in a room, listen to what you say, and then try to chart a programmatic or development path that will solve everyone’s problems within a couple of business cycles. It is also not a ploy to gather smart people into a closed space and then hope that magic happens, with some of the glory reflecting back on RLG Programs. Rather, it’s an opportunity for us to bring together a different sort of group than the ones that usually gather to discuss collaborative print collection management issues, to take a practical look at where progress in this area currently stands from both a collections and an access perspective, to catalog and categorize the kinds of obstacles that are keeping such efforts from going further, and to throw the full force of our collective experience and creativity at those obstacles in an effort to defeat them.

What do we at RLG Programs hope to get out of this meeting?

  1. Departing attendees who have identified at least one thing they will do upon returning home to further their collaborative print collection management efforts, with a promise to report back in three months.
  2. A grid showing various approaches to collaborative collection management, and the successes and obstacles associated with each approach, based on recent experience.
  3. A half-day’s worth of brainstorming about solutions to the identified obstacles and ruminations about how those solutions might change in a more digital future.
  4. Some high-level understanding of which types of solutions to managing collections collectively might be more effective at a local level, which at a regional level, and which at a national or global level.
  5. Some high-quality grist for refreshing the OCLC Programs and Research work agenda on the theme of “Managing the Collective Collection.”

A draft agenda for the meeting will go up on the RLG Programs Web site shortly. There will soon be an opportunity for anyone who’s interested to contribute to the conversation via a pre-meeting survey. Outcomes and next steps will be widely shared.

That’s the inside scoop on the November RLG Programs collections meeting.