I attended Clay Shirky’s talk part of the Long Now’s Seminars on Long Term Thinking (or SALT) series. The talk was well attended (at least 200 people), and the crowd was bright and engaged — many people came early to socialize, many people stayed late to chat.

The talk is not available for download yet, but will be.

Clay is an architect for the National Digital Information Infrastructure and Preservation Program (or NDIIPP) initiative at the Library of Congress. He also served as a consultant for the NDIIPP project, Archive Ingest and Handling Test (an article on AIHT will be forthcoming in the December issue of DLIB).

The talk was titled “Making Digital Durable: What Time Does to Categories.” Clay is exceptionally good at making big issues easily understandable. This talk was split into two parts, the first half dealing with the huge challenges of digital preservation.

Some takeaways from Clay’s digital preservation problem space:

  • Preservation is an outcome — you won’t know how successful you are until you’re there.
  • Preserving digital requires maintaining various components (format, media, interpreters, OS, architecture, etc.). Each component adds more shearing layers. The more shearing layers, the more risk.
  • Digital data is a social problem as much as a technical issue; and part of managing preservation will be managing social issues.

The second half of the talk was more (it seemed to me) about risks for access. Clay talked about classification systems and contrasted them with “tagging” systems such as Flickr, and

This second half was (perhaps necessarily) overly simplistic. He split metadata into classification (bad) and tagging/social networking systems (good). The problems he outlined with classification, I had no argument with: classification imposes human judgment and bias, presumes a context that is fixed in time and place. There are some excellent examples of this in Clay’s talk, drawn from Library of Congress Classification Scheme, Dewey, and Yahoo!.

Clay is largely concerned with descriptive categories losing meaning over time and not making sense over the long term. This is a compelling argument, but there are some holes in positioning tagging over classification. Classification is not fixed for a particular book. Sure, for a book on a shelf in a library, there is usually one class number. For a FRBR “expression” or “manifestation,” there may be many classifications (think union catalog). Classification serves a purpose for organizing and incidentally provides a point of access in the the stacks — in order to put a book on a shelf and to be able to find it later, you need to be able to “mark and park” it.

But a researcher need not wander haplessly about the library, through LC and Dewey mazes, to find what they are looking for. Remember the catalog? The catalog contains access points that go beyond classification. Fine, these access points (subjects, names, etc.) are also frequently subject to the same biases of human judgment and will flex or loose meaning over time. But they provide multiple ways to describe a book. In systems like the RLG Union Catalog, different librarians have assigned different access points for a manifestation, which are all preserved. Access points may be multi-lingual. The collective descriptive power of librarians with different views about what’s important is a powerful thing. Like tagging.

Let’s look at tagging. Tagging is also subject to human judgment; someone assigns an “aboutness” to an item. Because many people bring their viewpoints and judgments, there’s more descriptive diversity which is A Good Thing (and, I’d argue A Good Thing in the same way that the RLG Union Catalog is A Good Thing). But don’t the words taggers use now also need to be migrated over time? Just because a resource is swarmed on by social taggers now doesn’t mean it won’t be lost to the ages a year from now. I also think that social systems tend to attract a certain sort of someone. Someone technical, English speaking, with a computer, with lots of bandwidth to spare. With a reason to label. I could make some assertions about who uses the internet and how, but I’ll leave that to the folks at the Pew Internet and American Life Project.

Jerry McDonough wrote about this issue a little while ago, and I’m with him. I think that there needs to be a balance. Well-structured metadata and tagging can live together; it’s not time to chuck classification or the catalog quite yet.

Clay had an interesting point about tagging — that a drop off in tagging a resource could be used as a metric to determine that some sort of preservation action needs to be taken.

If Clay is ever in your town, I urge you to hear him talk. He’s a great thinker and speaker. Invite your administrators.

Here’s the rant. I have a real beef with Flickr as a Solution To All Our Problems or as true social software. Let’s contrast with is used to organize and categorize web pages; it’s used to describe something that is available to all (webpages), but allows them to be described in the way that’s most useful to the user. The tagging is not generally done by the person who created the resource, but people hoping to find and use the resource again.

Flickr on the other hand, is used not by people with a burning desire to label or organize something, but rather by people who want to store their photos and then share them with their friends. I assert that many Flickr images have such poor metadata that they cannot be found. Sure, you can find lots of examples of the Mermaid Parade in Brooklyn (Clay’s example), but try finding images like this one without knowing exactly what tags to use. The tags used here are cameraphone and treo650. Not to pick on poor danhon, but this is not a great example of tagging at work. And Flickr doesn’t claim to be a social cataloging destination. The Flickr FAQ says right up front: “Flickr is the best way to store, sort, search and share your photos online.” [I’ve added the emphasis here to make my point.]

Don’t get me wrong, I love Flickr (here’s one of my favorite Flickr searches for cupcake art cars). Flickr is great when you want to find something (and don’t care about being thorough, or missing something). It’s better than Google Images, because right up front you can see if something is available for use or not, and you can contact the owner. BUT I want to find more photos of the cupcakes, and I know they are in there somewhere. Probably labeled as “vacation” or “2005” or “myphotos.”

Am I missing something that everyone else sees? Please send comments!

