Richard Johnson, formerly of SPARC and now a special consultant to ARL, has written a useful article on mass digitization partnerships that summarizes some of the key concerns of research institutions: Are the combined efforts of the Google-, MSN- and OCA-library partners likely to produce a corpus of text that supports scholarship? Are content-contributors — jointly or individually — capable of shaping the outcomes of these large-scale projects? Rick’s article includes a helpful checklist of issues that library partners should consider before signing any agreement with a digitization agency. The article appears in the latest issue of the ARL newsletter; one wonders if folks at Princeton had a chance to review it before signing on to the Google Book Search project. Tiger, tiger shining bright … will your fearful symmetry survive in snippet view?
One of my thoughtful Programs colleagues has been reflecting on the requirements for building the new Alexandria, here on the northern shore of the marsh. Fittingly, a quagmire separates our Mountain View office from the Google-plex. Through the reeds, we can just glimpse the outer limits of their basketball court. On a recent walk around said marsh, Arnold — gesturing toward the backboard — observed that the single greatest obstacle to enabling scholarly use of the texts digitized by Google, Microsoft, the OCA and others is “rights: it’s all about rights.” The ability to layer innovative services on top of the digitized texts — tools for annotating, citing, copying and pasting from Songs of Experience, for example — will depend primarily upon the rights associated with the underlying documents, rather than the quality of the structural markup or accuracy of the OCR’d text.
Arnold was responding, in part, to some claims Karen Coyle made in an article published last November. Coyle, a careful observer mass digitization efforts and the “dotted lines” that underpin them, contrasts the reading and research environment of e-book providers with the utilitarian delivery platforms of mass digitization agents like Google and the Internet Archive.
“Services like ebrary and Questia use highly structured books (and other documents) to provide a kind of online research workstation that supports a range of activities common to higher education research and writing. These services would not be possible with an underlying database resulting from mass digitization.” Coyle op. cit.
Arnold argues — rightly, I think — that “the Google products of mass digitization be quoted (cut and pasted), incorporated, downloaded and archived in ways that would be perfectly adequate to support scholarship, except for questions of rights.” Tim O’Reilly has said that “book search should work like web search”, i.e., in a manner that supports cross-indexing and creative re-use of distributed online resources. I suspect this vision falls short of the rich reading and research environment that Karen is looking for — the bigger question for us, I think, is whether libraries have the collective capacity or will to create that environment. And, pace Arnold, whether or not library partners have secured the appropriate rights to build.
Constance Malpas was Research Scientist at OCLC. Her work at OCLC focused on data-driven analysis of library collections and services, with a special emphasis on strategic planning and managing institutional change. She has a particular interest in the organization of knowledge and research practices in the sciences.