I’ve been working through some comment spam issues (for any of you who’ve looked at our comments, you may have noticed!). For a while I turned comments off completely, and then just trackbacks. I don’t like feeling like I’m under the control of spammers and I disliked the idea of shutting down comments and trackbacks.

While poking around in WordPress land, I found a really great plugin for Spam Karma (Spam Karma is a WordPress plugin, and who knew that plugins have plugins). It’s called Akismet. And what does it do?

Automattic Kismet (Akismet for short) is a collaborative effort to make comment and trackback spam a non-issue and restore innocence to blogging, so you never have to worry about spam again.

In short, this is a network-level application that harvests all the collective intelligence about comment spam so that bloggers can take advantage of that intelligence. Now when I mark a comment as spam, that information goes somewhere. I not only use the service, but I also contribute. Although I use WordPress, there are also plugins for MoveableType and other blogging platforms. The service is not tied to a particular platform. It integrates into my environment seamlessly, and once it’s installed I don’t even notice it. I just enjoy the lack of comment spam.

While that’s a lot of detail about the inside of our blog, I hope this post will a) help some people with their comment spam, and b) give an idea about what a network level service looks like.

On February 15th, we had an open house at the OCLC offices in San Mateo. It was about time.

  • We moved in October
  • We spent a lot of November and December getting settled in, finishing all of the “little things” (like building conference rooms!)
  • January was a blur, with ALA and other trips taking up a lot of RLG Programs attention
  • In February, our Research Scientist Dublin-based Programs and Research colleagues were visiting, so, it was time to invite the neighbors, raise a glass, and toast our accomplishments (and our new building).

Here are some photos of our shindig. The weather was great, although we can hardly take credit for that.

Our balcony
Our Balcony

Jeff Ubois, Stu Weibel, Brewster Kahle, and Thom Hickey
Jeff Ubois, Stu Weibel, Brewster Kahle, and Thom Hickey

Food (and good company) draws a crowd
Food draws a crowd

We’ve decided (retrospectively) to dub this the first Bay Area ultra-un-conference. We challenge the California Digital Library, Berkeley, Stanford, or others to offer up the next opportunity. Any takers?

An article and a blog post have me thinking about how shifts in publishing will impact libraries, both on the collections side and on the metadata side.

The article from the Wall Street Journal details Random House’s experimentation with selling books by the chapter. In this case, the work being sold is “Made to Stick: Why Some Ideas Survive and Others Die,” a popular press work. The concept is just as applicable to academic works, and reminds me of discussions a few years ago leading up to our Discovery to Delivery symposium — what are the “consumable” bits of information that are wanted or most appropriate for a particular purpose? How do we get users to that bit of information, be it “liquid” (digital) text, or delivery of the physical materials? How do they discover that nugget when we give them information at the manifestation level? If publishers begin to disaggregate the work, can we get metadata at the chapter level from them in order to facilitate discovery?

The blog posting is from David Rosenthal. David talks about the relatively quick adoption of “mass-market scholarly communication” (blogs) within law. Clearly, since law review citations are increasingly pointing to communications in the blogosphere, discoverability is not so much an issue. But how to we incorporate this newer form of scholarly communication with the more traditional forms? Is it important to control text that already is liquid in some fashion?

I held off on blogging about the MCN annual conference in Chicago because my research colleague Jean Godby and I actually wrote up a little piece on the event. You can now find a summary of what piqued our interest at this 40th anniversary conference of the Museum Computer Network in the latest issue of Ariadne. Enjoy!

As a footnote to a previous post (where I confessed my deep love of WWOZ in New Orleans), I was happy to hear that the Library of Congress is giving a home to WWOZ’s extensive broadcast collection of music and interviews. Collected since 1982, weighing in at over 3,000 hours, and made up of many different (analog and digital) formats. This collection survived Katrina (barely), so this transfer to Library of Congress is excellent news. It will take over 10 years to catalog and digitize this collection. I, for one, will be waiting….

You may have already seen a press release, but in case you haven’t, I’ll first give you the gist, and then some insights into the work already underway.

With the generous support of a $145,000 grant from the Andrew W. Mellon Foundation, RLG Programs will gather a select group of museum partners to accomplish the following:

  1. Creating a low-barrier / no-cost batch export capability for CDWA Lite XML out of the collections management system used by the participating museums (GallerySystems TMS)
  2. Modeling data exchange processes using the Open Archive Information Protocol for Metadata Harvesting (OAI-PMH) at the participating museums
  3. Creating an aggregation of museum content within OCLC Research for analysis
  4. Discussing the evidence about the relative utility of the aggregation with stakeholders from the museum, vendor and aggregator community

It’s a tall order for a project with a 15 months lifespan, and that’s why we’ve already gotten started! On January 28th and 29th, representatives from the five institutions participating in phase 1 (Metropolitan Museum of Art; Museum of Fine Arts, Boston; National Gallery of Art; Princeton University Art Museum; Yale University Art Gallery) gathered at the Met to discuss the functional requirements for a data extraction and publication capability. Most of the participants had already dipped their toe into the water – in one way or another, they were trying to locally build a mechanism to generate CDWA Lite XML records. From this vantage point, the grant becomes an opportunity to jointly develop a tool which can be used by the entire community, rather than making redundant local investments which only benefit single institutions.

Of course the biggest challenge in this endeavor is the variability with which museums implement collections management systems. Each museum uses the data fields offered by TMS in a slightly different way, and oftentimes, there is even variation within a single installation as different departments use different guidelines for data entry. The Met serves as an illustrative microcosm for this challenge: collections are described in 20 separate installations of TMS, and each instance thinks about museum data in a slightly different way. At the Princeton University Art Museum, different mappings will be required for data created by different departments. As a consequence, a tool which aspires to be usable by any TMS installation has to provide a mechanism not only to tailor the mapping from TMS to the standard output (CDWA Lite XML in this instance), but also to allow users to apply different mappings to different sets of data within that installation.

During our meeting in New York, we agreed that one way to reduce the complexity of creating these mappings would be to make them shareable. As a baseline, we hope that our tool will ship with a default mapping which project participants determine takes them the furthest towards a meaningful output. Beyond the default mapping, museums will be encouraged to share their customization of data mappings on a field-by-field basis. Maybe your installation of TMS thinks of creators just like the one at the Yale University Art Gallery, but you use dates just like the MFA Boston. Combining these field-by-field mappings into a profile will allow museums who are new to the tool to generate a satisfactory CDWA Lite XML output with greater ease.

As I’ve been quoted as saying in the press release, while we are focusing on TMS for the initial testing of the tool, nothing in the technologies we are thinking of deploying would prevent the extract from working against other databases. As a matter of fact, both the Met and the MFA Boston were speculating about initially using the tool against a non-TMS consolidated datastore of records.

I’ll keep you posted on this effort as we continue our work on this grant. We’re off to a good start for phase 1!

(This went out on RLG Announce earlier today. You can subscribe to RLG Announce if you are an RLG Programs Partner.)

I’m pleased to report that RLG Programs is giving support to the Society of American Archivists Intellectual Property Working Group. We’ll be providing funding for an Orphaned Works Investigation Best Practices Retreat, which will take place in March. The retreat will enable intellectual property experts from the archival community to meet face-to-face to develop best practices for reasonable investigation of orphaned works in archives. I’ll be attending the meeting, and I’m very excited at the prospect of being able to establish best practices that will lead to greater use of archival materials.

Our digital age is not the discoverer of scanning technology, though it is in the process of discovering how to do mass scanning, at least of flat objects. On a recent visit to the Victoria and Albert Museum in London (one of our Partners), Jennifer Schaffner and I discussed the museum’s digitisation plans with Doug Dodds, Head of Central Services. These are ambitious: they intend to digitise all 750k objects from their Prints and Drawings Study Room collection over the next few years. They have already digitised the first 2,500, and are working systematically, with an intention not simply to ‘cherry-pick’ the obvious items for digitisation.

Doug was however keen to show us that the V&A has been in the virtual collection business for a long time – and a particular heyday occurred in the late 19th century. The V&A’s Cast Courts represent the world’s largest collection of plaster cast reproductions of works of art and architecture, assembled to suit Victorian tastes and to allow visitors to experience as closely as possible the effect of being in the presence of the originals, at the cost merely of travelling to South Kensington. Cherry-picking from the entire collection of the world’s sculpture and architecture was evidently a necessity, but it did permit astonishingly ambitious choices. Among the grandest copies in the collection is the Puerta de la Gloria in Santiago de Compostela. The original dates from the late 12th century, and – as the V&A website tells us – ‘The casting of this immense structure was an operation involving a sea voyage beset by numerous hazards – storms and fumigation against cholera included – as well as protracted and delicate negotiations with the ecclesiastical authorities.’ Perhaps even more extraordinary is the plaster copy of Trajan’s Column, which is so large that it has always been exhibited in two pieces. 17236-medium.jpg

Casts were made by taking plaster moulds from the original – a form of scanning – and were usually done in several pieces which would be skilfully reassembled so that the joins did not show. Trajan’s Column, remarked Doug Dodds, is an early forerunner of our use of FTP to transfer files so large that they cannot easily be reassembled. The cost involved in assembling virtual collections made from plaster was of course enormous, and the Victorians are an inspiration to us as we set about the task of digitising materials from museums and libraries to form virtual collections for the world. We may also learn a lesson or two about preservation. Students of sculpture and architecture are returning to some of the casts in the V&A, and other museums, either because the originals have been destroyed, or because the damage done by environmental pollution means that the detail on the casts now provides a record for scholars, and the copies offer greater perfection than do the originals.