Yesterday I participated along with two other RLG colleagues in an invitational day-long workshop sponsored by the Open Content Alliance (OCA) hosted by one of the founding participants The Internet Archive. This working session was capped by an evening reception at which Brewster Kahle shared his vision of the kind of “open library” that the Alliance could create. The event was attended by over 200 guests and included announcements of the next round of contributing participants. The big news was that Microsoft and Yahoo! have provided financial backing for book digitization and agreed to the principles of the Alliance.
That these two huge rivals could come together in support of this effort drew renewed press attention to the possibilities and opportunities created by mass digitization of library materials. I think it breaks open a logjam that has developed over the last few years in the library, archive and museum community. The pile-up of all our pilot projects along with our early digital service efforts and the presence of the huge Google Print boulder has made it difficult to focus our vision and think positively about ways to work towards it. For those in attendance this was a very happy occasion. It felt like the start of something very real and important. It felt like the forward flow was beginning.
The founding participants reached out to other OCA contributors and experts to try and “answer the question of what technologies are needed so that we can move book scanning and online distribution forward.” The group of about 30 worked in smaller groups on the topics of
1. Scanning and image processing
2. Copyright, collections and cataloging/metadata
3. Formats, tools, and interfaces
4. Governance and organization
The goal was to put some specific shape on the challenges and issues associated with doing mass digitization at industrial throughput levels and set a target for where the OCA could and should be in these areas one year from now. I was impressed with the thinking reported out by the scanning and imaging group and the formats, tools, and interfaces group both of whom offered up very sensible baseline requirements and architectures. I think that was due to the influence of John Kunze of the California Digital Library and Herbert van de Sompel of Los Alamos National Laboratory who took the respective leads. They avoided the kinds of group urges that emerge in these settings to overbuild and honor every desire as a requirement.
The cataloging and metadata group had an easier time of it given that high-quality, consistent metadata for description, discovery and coordination are relatively obvious and things with which this community is familiar and comfortable.
The governance and organization group in which I participated struggled. I think some of that may have arisen simply from the use of the word governance. That immediately took the conversation down paths that were fraught – speculations about a membership organization, executives, advisory councils, etc. At the end of the day, however, they agreed that the most important thing was to have a clear articulation of the short list of the principles to which any participant in the OCA would have to subscribe. The founding members asserted a starter set of those principles that are published on the OCA website. Nevertheless it’s already clear that those need to be expanded and clarified. I don’t want to usurp the work that needs to go forward but a good example of a principle that might need to be in place is that all contributed content is available for bulk downloading and re-hosting in other environments. This principle might be required for the vision of re-usability and re-purposing of library content to be fully realized.
In any case, a proper final set of principles would inform how much organizational structure is really necessary and what other supporting mechanisms have to be created. Rick Prelinger of the Internet Archive was charged with drafting up an approach to be reviewed by the founding participants and on which the community would be consulted. I was much in favor of this approach. It seemed to me that defining the endpoints was necessary before one could fill in the rest. The vision of the OCA is pretty clear and the principles that would have to inform contribution and creation of digitized content flow pretty straightforwardly from that vision. This is a formulation that says you can’t have a particular desired end state if you don’t start from certain consistent foundation characteristics. If you want that then you have to start with this. What fills between principles and vision are the administrative practices, operational processes and technical requirements that implement those principles to achieve that end state. And those can then be designed to be suitable to the purpose. We expect this to be fastened down over the next thirty days or so.
You’ll find lots of blog entries about the launch elsewhere. If you want to have the flavor of the evening I’d suggest you go to the Open Library web site and turn the pages on the Open Library volume there. That’s what Brewster did.