The Have and Have Nots

Friday, September 30th, 2005 by Günter

Did the headline for this NEDCC announcement make you break into a wry smile?


I think everybody engaged in digital preservation has been trying to get everybody who is currently disengaged to grasp the idea that where there’s no plan, there won’t be a digital asset. And I guess this observation means that indeed, what this headline spells out is still news to a lot of people – but to those who have already signed on, it’s stating the painfully obvious.

The announcement goes on to say:

THE MAJOR CONCLUSION was that small and medium-sized institutions will need help from specialists in surveying the preservation needs of their growing digital collections. The group took significant first steps toward developing practical planning tools for assessing the preservation needs of digital collections and envisioning a national survey program.

(Find the complete announcement at Jill Hurst-Wahl’s digitization blog.)

I’ve mentioned before that I think we’re moving towards a very interesting situation where the preservation “haves” will partner up with the preservation “have nots.” Mutual benefit is assured: many smaller institutions won’t be able to build a trusted digital repository, and the larger institutions will have a vested interest in leveraging their investment. I’ll be curious to see what kinds of business models emerge from this collaboration – will there be cost-recovery charges, or agreements on the re-use of desirable contributed content by the repository owner?

I hope some of these questions will be answered on a panel I’ve organized for the upcoming Museum Computer Network (MCN) conference in Boston. The panel is on Friday Nov 2nd. My colleague Robin Dale will speak about the new RLG-NARA certification audit checklist and the CRL certification of digital archives project she directs; Patricia Cruse from the California Digital Library and Stephen Abrams from Harvard will give us some insight into how they see collaboration from the perspective of the preservation “haves.” If you’re interested, please also check out the rest of the MCN conference – the theme is digital preservation, and NEDCC (the circle closes) will hold a pre-conference workshop!

Space – the final frontier

Thursday, September 29th, 2005 by Günter

If you thought the sky was the limit for Google, think again:

NASA and Google have signed a memorandum of understanding (MOU) that outlines plans for cooperation on a variety of areas, including large-scale data management, massively distributed computing, bio-info-nano convergence, and encouragement of the entrepreneurial space industry. The MOU also highlights plans for Google to develop up to 1 million square feet within the NASA Research Park at Moffett Field. (NASA Press release)

I may not have gotten the boundaries for Google and NASA-AMES quite right, but here’s how we fit into the picture:

Works for me…

Thursday, September 29th, 2005 by Günter

This blogging thing is really starting to work for me – no, I’m not talking so much about my own attempts at sharing my inner-most thoughts and insights with the world at large (or shall we say the exactly 100 subscribers we have according to Bloglines as we come up on our 3 months blogging anniversary on October 1st), I’m talking about being a consumer of blogs. Recently, I’ve especially enjoyed following some of the discussions around the concept of Web 2.0 in the blogosphere. For those of you who haven’t picked up on this new buzzword yet (and by the way, buzzwords are now being promoted to memes, or so it seems), you can find a succinct summary of Web 2.0 characteristics and many pointers beyond at Paul Miller’s insightful blog. Lorcan Dempsey has posted a graphic by Tim O’Reilly (who, by the way, also writes a blog I enjoy reading), which shows the technologies and ideas populating the Web 2.0 space.

If I had to summarize in one sentence what Web 2.0 means to me at this point (and what it means to me may be far from conclusive), I’d say that it’s a conversation about all the nifty tricks data can perform if it’s provided in a way which invites others to participate and collaborate in its exploitation. My excitement about my bank’s online portfolio management feature as chronicled here, for example, was Web 2.0 excitement. A lot of the Web 2.0 talk sounds familiar – hallmark features are virtues the cultural heritage community has espoused for years now, such as interoperability, modularity, collaboration, end-user participation, etc., but it’s refreshing to see those ideas dressed up in a new garb, and backed up by some new technologies as delivery mechanisms.

Too bad none of us will be able to go to the upcoming Web 2.0 conference in San Francisco – let’s just say that at $2795 a pop, it was not an item that would have gone unnoticed as an unbudgeted expense! Another way in which this blogging thing is really starting to work – at least some of us will have the good fortune of having drinks with Paul Miller & colleagues just after they’ve stepped off the plane – he offered to meet up in a response to one of my previous postings. Thanks, Paul!

The downside of collaborative filtering

Tuesday, September 27th, 2005 by Merrilee

Yesterday’s San Jose Mercury News carried an article on recommending systems and collaborative filtering. (You can get the article, actually from the Los Angeles Times, here although you may need to register.) The article pointed out some downsides to collaborative filtering. I wondered which of these downsides would transfer to the scholarly world, and how.

The first downside is dumbness. I’ve noticed this for years. I use Amazon almost exclusively to buy things for other people – I usually know what I want to buy for them, and value someone else shipping something for me (as opposed to getting to UPS or the post office myself – something I hardly ever do). What I do not value are the “recommendations.” The recommendations, in my case, are based on occasional purchases for a variety of people. Yes, I know I can improve my recommendations, but that’s never my task or my interest. I want to get in and get out.

The article in the Merc cites another kind of dumbness, outlined in the following scenarios; you are going someplace (let’s say Italy) and you buy some travel guidebooks. You take your trip and come back, only to be pelted with more Italian guidebook recommendations.

I think the first kind of dumbness (recommendations, but not for me) wouldn’t necessarily transfer to the scholarly world since most people don’t do a lot of research on behalf of others. The second kind of dumbness (I’m so over that, thank you very much) could apply, particularly for students who have new research topics imposed upon them frequently. Wrong recommendations can be treated by working with the collaborative filtering software – what I never have time to do with Amazon, editing preferences and expressing lack of interest in Italy. In the case of “I’m so over that,” or the student with many topics, having multiple profiles within an account may be a way to go.

The other downside of collaborative filtering, narrowness, could have implications for scholarship that I think are worth exploring.

In the marketplace, collaborative filtering provides a service to consumers by helping to narrow millions of choices down to a few. A list of a few well-targeted recommendations also helps to increase sales, and also helps consumers find items they would not have otherwise found. I’ve had this experience on Netflix, finding and enjoying movies I’d never heard of. But narrowing down comes at a price, described in the article as “society [balkanizing] into groups with obscure interests.” While groups with obscure interests sounds, on the face of it, like scholarship itself, there is a definite negative aspect to this. What scholar wants to only read one side of an argument, one interpretation of events, one school of thought? How many scholarly careers have been changed by serendipity? This could come from a conversation at a cocktail party, or could also stem a confusing moment in the catalog, a wrong turn in the stacks. How do we interject chance and opportunity into recommending systems? Maybe recommending systems will make cocktail parties even more important.

In any case, all this has me thinking. Time to start planning that trip to Italy.

It’s a sorry frog that don’t praise its own pond

Monday, September 26th, 2005 by Anne

Libraries, archives and museums today all have the common responsibility of managing digital information. Whether its born or reborn digital stuff, seeing it through its life cycle involves capturing, naming, describing, nurturing, keeping it alive, letting it speak, and ultimately letting it die or depositing it into cryonics.

Preserving digital data means ensuring the proper cryonic conditions are ideal for a long-term stay in the final resting place. Ten years ago, a group of really smart people realized that this particular piece of the digital problem would likely be the most difficult one. Preserving Digital Information, the landmark study published by the Task Force on Archiving of Digital Information, co-sponsored by RLG and the Commission on Preservation and Access (now Council on Library and Information Resources) issued a set of nine recommendations that have shaped our preservation agenda and informed numerous digital preservation projects.

In 1996, Recommendation number 7 proved to be one of the most controversial and difficult to tackle. It reads: “Institute a dialogue among the appropriate organizations and individuals on the standards, criteria and mechanisms needed to certify repositories of digital information as archives.” At the time there were only a few repositories that were actually storing digital content and the thought that there could possibly be a way to “certify” them seemed impossible.

RLG, never an organization to shy away from the hard problems, 5 years ago began discussing with OCLC ways the two organizations could cooperate to create infrastructures for digital archiving. RLG took the lead in work to define and gain consensus on the characteristics of a sustainable digital archives.

A group of jointly appointed international experts participated in the effort, which produced the May 2002 report, Trusted Digital Repositories: Attributes and Responsibilities. Building on that work, RLG found an appropriate partner in NARA, to tackle the next step to create the certification process.

Just last month, the RLG-NARA Task Force on Digital Repository Certification, released a draft of An Audit Checklist for the Certification of Trusted Digital Repositories. This document is currently available for public comment and seeks community wide review. Please pay attention and help make this work the best it can be.

The checklist is also being put through its paces in a new project sponsored by Center for Research Libraries, and managed by RLG program officer, Robin Dale, to test audit three large international digital repositories. The results of these combined efforts should solve many of the issues identified in that pesky recommendation number 7.

I’m proud of our pond.

The hills are alive…

Monday, September 19th, 2005 by Merrilee

Indoor plumbing. Co-ed housing arrangements. Vending machines. Internet access. Music downloading and file swapping. What are considered the “basics” for American college undergrads are being redefined yet again. On Thursday, the University of California, California State University, and North Carolina University systems announced deals with Napster. This brings over 40 schools into a club that already included the likes of Penn State, USC, and Cornell University. As cited in a recent article from the Chronicle of Higher Education , College administrators may be motivated by the recent Grokster decision, or may simply be trying to keep up with redefined norms for freshman Joneses.

Other more scholarly efforts are also going forward, such as the Database of Recorded American Music. DRAM sets out to establish a “core” of American music for teaching and study along the JSTOR model. DRAM is being shepherded along by New York University and New World Records, and tested by Columbia, Dartmouth, and Indiana Universities this year.

Since I don’t have access to DRAM or to Napster, I looked around for some free (and legal!) access to online music. As an OS X user, I first turned to iTunes. Apart from some podcasts, I didn’t find very much. Part of the problem that the iTunes music store is, well, a store, and does not have an easy way to specify that you are looking for free material. Next, I tried Yahoo’s Creative Commons search. Although this led me to a number of interesting sites, there was not a good way to limit my result to page that contained content with a Creative Commons license and a specific format (like mp3).

The best collection of free music I found was the Live Music Archive at the Internet Archive. This is a collection of “trade-friendly” recordings from live concerts. Along with a huge collection of the usual Grateful Dead material, I found a quite a few happy surprises, such as Will Bernard (from Motherbug and other efforts – I listened to his November 2001 concert at the Boom Boom Room in San Francisco). Unlike many other interesting free download or streaming audio sites, the Internet Archive has enough content to keep me coming back for a long time

I wonder how this near-universal access to music online might shape teaching and learning. Will there be attempts to “federate” services like DRAM with services like Napster, mixing the popular with the scholarly? Campus collections? What about the offline music offerings on college campuses, inventoried in the OPAC or union catalogs? Creative Commons material on the web? Amazon offerings?


Thursday, September 15th, 2005 by Günter

Over the last couple of years, I’ve had various experiences of interacting with the people from the other side of the fence – they’re usually called vendors, manufacturers, or “the industry” and the difference between them and us is that we’re non-profit, and they’re for-profit. Standard procedure of course being that non-profit feels morally superior to for-profit, and for-profit feels that if you’re non-profit, you’re probably too poor to be “in the market.” Of course all of this is silly, and it gets us precisely nowhere.

Take digital preservation, for example. Digital preservation stands a lot to gain from the availability of file formats which are engineered with a long-term perspective in mind – an example of a successful collaboration between the industry (Adobe) and a wider community of interested parties, including cultural heritage representatives, is the development of PDF/A into an ISO standard. In addition, digital preservation stands a lot to gain from mechanisms which allow files to self-document and provide us with lots of data about their specific technical properties. Since preservation really is a metadata guzzler, recent reports such as PREMIS highlight the importance of automating the capture of this type of information.

At RLG, we’ve pursued the idea of automating the capture of technical metadata in a project called Automatic Exposure, which brings me back to the theme of interacting with our friends from the other side. We’ve had conversations with the industry about how they could optimize their products (digital cameras, scanners, software packages) to facilitate gathering the metadata laid out by NISO Z39.87, a preservation standard currently in the process of being balloted. While everybody brings the best intentions to these discussions, somehow the end result is rarely more than what I’d call a “promising exchange.” I don’t mean to blame the industry for this – it always takes two to tango. What I’ve learned from all of this is how hard it is make a case our prospective collaborators can truly run with.

Even when we meet with individuals who show great interest in our concerns and “get it,” we have to provide them with the right kind of ammunition to take the argument up their reporting chain. How do we communicate that the use of standards would actually give a product a competitive advantage in the marketplace? How do we communicate that our interest in long-term preservation isn’t so special – we’re not the only ones who’d like to see digital images accessible for generations. How do we join with those from other communities who have like interests, and make our case together?

Speaking of communities who have an interest in digital preservation (although they may not yet be aware of it) – I’ve been to a number of weddings lately, and invariably, the official photographer (or the best pal of the groom, whoever happens to be taking the pictures) no longer captures these precious moments on film, but digitally. While everybody else was showering the bride and groom with advice on how to make their marriage last, I felt compelled to provide advice on how to make their images last – just to be on the safe side, why don’t you make sure to get some high-quality prints of these wedding pictures, would you?

Google Blog Search

Wednesday, September 14th, 2005 by Merrilee

You knew it was coming. Google Blog Search.

Here’s a prepopulated search, if you can’t think of your own.

The advanced search is pretty interesting, and takes advantage of some RSS tags.

better than a red wheel barrow

Tuesday, September 13th, 2005 by Jim

The receipt of the following haiku was sufficiently serendipitous, satisfying and apt that I want to share it.

a blonde girl, wearing
a pink skirt, on a blue bike,
willed the red light green

A former RLG staff member, Dylan Tweney, publishes a lovely daily haiku magazine, tinywords. (He’s also the executive editor of Mobile magazine.) You can sign up at the tinywords site and get a daily haiku sent to you via email or to your mobile device. I shared this one with the RedLightGreen team here at RLG. We decided it was worthy as a haiku even if you didn’t have our reference point.

P.S. RedLightGreen will give you some good selections on Haiku – History and Criticism.

What the neighbors are up to

Friday, September 9th, 2005 by Günter

This isn’t brand-new anymore (it got published on August 31st), but it’s still funny – the Onion headlines “Google Announces Plan to Destroy All Information It Can’t Index.”

“Our users want the world to be as simple, clean, and accessible as the Google home page itself,” said Google CEO Eric Schmidt at a press conference held in their corporate offices. “Soon, it will be.”

A little Friday levity.

P.S.: Check out on a satellite image where we are in relation to Google, courtesy of (you’ve guessed it) Google Maps. (All the earthy brown spots, by the way, are grassy green by now.)