A little over a year ago, I inherited a project that didn’t have much more than a name: “Explore and understand the place of large digital text aggregations in scholarship and research.”
I had several discussions with my colleagues about what this project might turn out to be. We had several ideas:
– Create a shared understanding of the expectations that researchers and students bring to their interactions with large-scale text aggregations on the web and the requirements for making these collections fit for scholarly use.
– Convene an invitational meeting of those already engaged in large-scale digitization efforts to establish a common understanding of scholarly use-cases and the core requirements for library-sourced research services.
– Identify service capabilities (bookmarking, annotation, citation management, etc) that are required to support scholarly use of text aggregations.
– Assemble a text archive for prototyping and analysis.
– Investigate needs of scholars (via focus groups?)
– Experiment with the metadata we get from OCLC’s e-Content Synchronization service to see how we can characterize the contents of book aggregations
– Experiment with full text functionality we might be able to offer a) on a specific aggregation b) across aggregations
What we were exploring went beyond finding and using a single document. It was about identifying works from many silos to incorporate into a local environment. And it was about performing actions against an index (or multiple indexes) of aggregated digitized works. We could investigate how scholars would work with the range of book text archives, starting with use case scenarios of the types of queries (e.g., in areas such as linguistic analysis, lexical frequency, translation studies, edition comparisons, things like occurrence of geographic place names in fiction, and coincidence of events - like being able to explore how a race riot affected neighborhood population dynamics).
Read the rest of this entry »