Archive for the 'Digital Preservation' Category

Research dissemination and ‘the archive’

Monday, April 26th, 2010 by John

Ithaka S+R recently published its Faculty Survey 2009: Key Strategic Insights for Libraries, Publishers, and Societies. It considers the way faculty views of the library are changing, and analyses library roles into three key functions:

“The library is a starting point or ’gateway’ for locating information for my research” (which we refer to as the gateway function). “The library pays for resources I need, from academic journals to books to electronic databases” (which we refer to as the buyer function). “The library is a repository of resources – in other words, it archives, preserves, and keeps track of resources” (which we refer to as the archive function).

Ithaka’s analysis shows that the gateway function has declined (its importance rating has dropped from 70%-58%) over the six years in which the biennnial studies have been made, while the buyer function has steadily increased (81%-90%). The archive function has remained relatively static at just over 70%.

Many of the findings in this report are interesting, and relevant to us as we focus - via our Working Group on Research Services - on the specific topic of Support for Research Dissemination. We have chosen the word dissemination with some care. What we will be looking at is researcher behaviours and practices concerning institutional repositories, individual websites, subject archives, virtual research environments, blogs, blog aggregations and other social venues. In other words, every research dissemination venue except the conventional (and still overpoweringly influential) modes of scholarly publishing - the journal, the monograph and the conference paper. We will look at the way researchers use these alternative venues to disseminate their work, and the factors that account for the types and rates of dissemination. Read the rest of this entry »

The Cult of Brewster Finds Its Church

Tuesday, October 20th, 2009 by Roy

The Internet Archive's New HomeLast night Brewster Kahle of the Internet Archive unveiled his latest project in a venue suitable for any high priest or cult leader — a former Christian Science Church in San Francisco. As it turns out, the Internet Archive recently purchased the building, and as Brewster remarked during the grand unveiling of the Bookserver project, it even matches their long-time logo, which was selected on purpose to imply a physical library.

Although the mood in the great room of the church that eventually Brewster hopes to turn into a modern-day library reading room was more hallelujah-inspiring than anything, the day preceding had been more down-and-dirty technical. The two-day meeting (still going on as I write this), is more about AtomPub and identifiers than holy water and consecrated wafers, but all of it does take a certain amount of faith. Read the rest of this entry »

John R. Stokes, Imaging Innovator

Monday, July 27th, 2009 by Ricky

John R. Stokes passed away this weekend. This caused me to reflect on both his career and mine.

When I started at the Library of Congress in 1985, I was an early entrant into the library imaging scene, but John Stokes was already there. He captured some of LC’s huge photo collections, at that time putting them on videodisk, as part of the Library’s Optical Disk Pilot Project. Anticipating that LC would ultimately want digital images, he saved the digital intermediates. As CD-ROMs became the preferred medium, he was able to deliver those digital images to LC for a tiny fraction of the cost of recapturing them. He didn’t shy away from any original formats, whether slides, large glass plate negatives, or ungainly panoramic photos (for which he built an amazing transport system that captured and stitched together 8 foot long panoramas).

When I came to RLG in 1986 1995, John was already at work there, too, on the Digital Image Access project [19.1 MB PDF file] — where he was helping RLG members with the human side of imaging. He developed software to manage description of images and to provide access to them. He’s done work for NYPL, National Geographic, The Smithsonian, National Library of Medicine, and other museums, universities, historical societies, and cultural heritage organizations.

In the last couple of years, he and I talked many times about ways to increase the scale of digitization of special collections. I wondered if devices could be made to increase throughput for special formats in the way that the Internet Archive and Google had increased throughput for books. Once again, John was already most the way there. He had developed a capture station that could be used with a variety of robotic materials handling devices [PDF] for various formats: manuscripts, large reflective materials, transparent materials of all sizes (including film reels), post cards, and so forth.

His physics background and in-depth knowledge of color, lighting, and photographic processes allowed him to push the envelope in designing capture equipment. As happened with high-end digital cameras, if it didn’t exist and he couldn’t build or adapt it, he’d go to the manufacturer and get them to improve their equipment until it met his high standards. He devoted a lot of attention to the process, too. Software to keep track of the workflow, allow metadata input, perform image correction, facilitate quality control, and track technical data were a key part of any system he put together. He knew that while he could automate the capture, the workflow software would help to improve the human factor.

John’s concession to my plea for faster production of access images was to make the process quickly down-sample images to derive smaller images for web access, while making it possible to save an archival-quality image to storage. He learned long ago that while people may ask for a quick access-quality image, eventually they’ll want more.

John was open and honest with his customers, admitting when he was out of his depth (not often) and pointing out ever so gently when the customer was out of their depth (in my case, more often than I like to admit). His innovative approach and his commitment to quality put him squarely at the top of my list when I was asked for advice on imaging equipment or for a service provider. He was also a kind, genuine, and gentle man, always happy to talk, whether it was about “bidness” or his and his wife Bettye’s latest adventure.

I make it sound as if John ran a one-man show. He had the support of many others over the years, including several of his family members. His good work will be continued by them and other good people at JJT, Inc. under the expert eye of his son, John T. Stokes. Already they reassure us that, within a couple of months, the Stokes Imaging System for special formats will be in place for pilots at two RLG partner institutions.

Joint Conference on Digital Libraries 2009 - good posts

Wednesday, June 17th, 2009 by Jim

My former RLG colleague and current OCLC software development manager, Judith Bush, is doing some nice blogging about the JCDL 2009 Designing tomorrow, preserving the past-today sessions here. Check it out.

Digging into digital

Tuesday, May 12th, 2009 by Merrilee

At the end of last week, I had a chance to read the just-issued white paper, Approaches to Managing and Collecting Born-Digital Literary Materials for Scholarly Use (Matthew G. Kirschenbaum et al). (To get to the paper, follow the link and scroll to “download.”)

My first reaction was, wow, the NEH certainly got a lot for the money. For $11,708, this project certainly packed a lot in. The project started with a core group of institutions with a common interest — preserving and giving access to (the second part is important) “the born-digital documents and records of contemporary authorship.” These records are usually a “hybrid” consisting of both electronic and print outputs. The original group included University of Maryland, UT Austin, and Emory University, but by the end of the project the group had expanded to bring in viewpoints from the Library of Congress, Stanford University, the University of Maine, Yale University, the New York Public Library, the British Library, and the University of Oxford.

I liked two things about this report. The first is the sense that there was a real exchange of ideas, of institutions wanting to learn from one another (rather than develop their own way of doing things). The second was that the project engaged practitioners, primarily authors, in a deep way, trying to understand the way that documents are created and used. I was happy to see further work with both scholars and authors listed under next steps — I think this deep understanding of how these documents are created and how they function, along with how they will be used are essential for reaching understanding preservation and access needs.

I’ll be attending an advisory board meeting for the Future Arch project at the Bodleian (Oxford) in September, and I found this report an excellent primer on many of the issues surrounding the preservation and use of hybrid collections. It introduced some ideas I hadn’t considered or taken seriously before, such the materiality of the creation surround.

Conversations about dealing with digital collections in special collections are often marked by hand-wringing (and remarks about, “that will be after I retire,”) so it’s great to see the community rolling up it’s sleeves and getting to work.

Interesting ideas do not always a project make

Monday, April 20th, 2009 by Ricky

A little over a year ago, I inherited a project that didn’t have much more than a name: “Explore and understand the place of large digital text aggregations in scholarship and research.”

I had several discussions with my colleagues about what this project might turn out to be. We had several ideas:

­– Create a shared understanding of the expectations that researchers and students bring to their interactions with large-scale text aggregations on the web and the requirements for making these collections fit for scholarly use.

­– Convene an invitational meeting of those already engaged in large-scale digitization efforts to establish a common understanding of scholarly use-cases and the core requirements for library-sourced research services.

­– Identify service capabilities (bookmarking, annotation, citation management, etc) that are required to support scholarly use of text aggregations.

­– Assemble a text archive for prototyping and analysis.

­– Investigate needs of scholars (via focus groups?)

­– Experiment with the metadata we get from OCLC’s e-Content Synchronization service to see how we can characterize the contents of book aggregations

­– Experiment with full text functionality we might be able to offer a) on a specific aggregation b) across aggregations

What we were exploring went beyond finding and using a single document. It was about identifying works from many silos to incorporate into a local environment. And it was about performing actions against an index (or multiple indexes) of aggregated digitized works. We could investigate how scholars would work with the range of book text archives, starting with use case scenarios of the types of queries (e.g., in areas such as linguistic analysis, lexical frequency, translation studies, edition comparisons, things like occurrence of geographic place names in fiction, and coincidence of events - like being able to explore how a race riot affected neighborhood population dynamics).
Read the rest of this entry »

Herbert’s Adventures In Linking

Thursday, February 5th, 2009 by John

The title of this post is my homage to another famous Belgian.

I have been posting from the 9th International Bielefeld Conference in Germany. In yesterday’s closing keynote, Herbert Van de Sompel gave a most unusual presentation. Preparing, on his return to the Los Alamos National Laboratory, for a six-month sabbatical, he used the occasion to review the work he and his various teams have done over the past 10 years or so - and bravely assessed the success or otherwise of the major various initiatives in which he has been involved - SFX, OpenURL, OAI-PMH, OAI-ORE and MESUR (not for the acronymically faint-hearted). Incidentally, the 10-year boundary was as much accident as design. With the exception of one slide (pictured) showing his various project clusters, he had not prepared a new presentation, but instead paced around in front of a succession of old ones – some looking pretty dated – displayed in fabulous detail on the gigantic screen in the Bielefeld Convention Centre main hall. With a plea for more work on digital preservation, he stated that he had discovered that those Powerpoint presentations which were more than 10 years old were no longer readable.

The SFX development work, done at the University of Ghent, has resulted in some 1,700 SFX servers installed worldwide, which link – at a conservative estimate – to some 3 million items every day. Less successful, in his view, was the OpenURL NISO standard. It took three years to achieve, and – despite his ambitious intentions at the time – is still used almost exclusively for journal article linking. Reflecting on this, he remarked that the library community finds it hard to get its standards adopted outwith the library realm.

Herbert was also ambivalent about OAI-PMH. The systemic change predicted at the time of its development has not happened, and may never happen. He remarked that ‘Discovery today is defined by Google’, and in that context PMH did not do a good job because it is based on metadata. Ranking is based on who points at you (see my earlier post on the Webometrics ranking). ‘No one points at metadata records’. But it still provides a good means of synchronising XML-formatted metadata between databases.

He feels that we are moving on from a central concern with journal articles in any case. ‘What do we care about the literature any more? It’s all about the data (and let’s make sure that the data does not go the way of the literature!)’. He offered some reflections on institutional repositories in passing. They are not ends in themselves (though often seem to be). There is a difference between their typical application in the US and in Europe. European libraries use them more for storing traditional academic papers – versions of the articles which appear in peer-reviewed journals. In the US, there is a tendency to use them for ‘all that other stuff’. They are relatively unpopulated due to the fact that authors find it hard to care once they have had the paper accepted by their intended journal. But the other problem is workflow. Most repositories require deposit procedures which are outwith faculty workflows. Worse - content is being deposited by faculty all over the web – on YouTube’s SciTV, on blogs, in flickr. They have no time left for less attractive hubs. We need a button with the simplicity and embeddedness of the SFX resolver button to be present in these environments before we will truly optimise harvesting of content into the repository. There is a challenge …

The ORE work learned lessons from PMH. PMH did not address web architecture primitives. That was why Google rejected the protocol. It did not fit with their URI-crawling world view. ORE therefore used the architecture of the web as the platform for interoperability.

As for the MESUR project, directed by his compatriot Johan Bollen, Herbert described it as ‘phenomenal’. MESUR took the view that citations as a measure of impact were appropriate for the paper-based world. But now we should assess network-based metrics (the best known of which is Google’s PageRank). A billion usage events were collected to test the hypothesis that network metric data contains valuable data on impact. The hypothesis, he believes, was proved correct. There is structure there, and the ability to derive usable metrics. Indeed, the correlations produced by MESUR reached the fairly radical conclusion that the citation analysis data we have been using for decades is an outlier when compared with network-based methods.

Overall then, more plus points than negatives. And not only was his audience not inclined to criticise, but he was urged to stay and complete his presentation even though it ran over his allotted time by about 20 minutes at the end of an intensive day. How many people in our profession could discuss their work with reference to so many iconic projects? He concluded with a simple message - which he had come to see clearly as he prepared this review: we do what we do in order to optimise the time of researchers. Some recent studies, such as the UK Research Information Network’s Activities, costs and funding flows in scholarly communications (discussed earlier in the conference by Michael Jubb, Director of RIN), and the more recent JISC report, Economic Implications of Alternative Scholarly Publishing Models: Exploring the costs and benefits, express researcher time in cash terms. It amounts to billions of pounds each year.

How much money has been saved and so made available for further research by the projects developed and overseen by Herbert and his colleagues? There is optimisation to be proud of.

Te Puna Mātauranga o Aotearoa Rocks My World

Wednesday, December 3rd, 2008 by Roy

The National Library of New Zealand (Te Puna Mātauranga o Aotearoa in Maori and an RLG Partner) has obviously been busy. Last week they joined the Flickr Commons, and they have already reported some impressive use statistics. But today (well, yesterday in Kiwi time) came an even bigger announcement.

Digital New Zealand, “a nation-wide project to help make New Zealand digital content easier to find, share and use was launched at the National Library of New Zealand on 3 December 2008.” The incredible array of collections made available through this one interface would be news enough for many libraries. But the joy doesn’t stop there.

The project welcomes additional content contributors, and stands ready to provide advice and assistance to help them to do so. Visitors are offered an opportunity to create a tailored search of the site and drop the resulting widget onto any web page they like or use the special search page that is created for them right on the Digital New Zealand site.

If a visitor doesn’t wish to create a tailored web widget, they already have a library of such from which to choose. And for the true technorati, there is the developer section, which provides a simple way for software developers to get a key to be able to use the application programming interface (API) of the site. If all of this isn’t enough to knock your socks off, stay tuned.

The “Memory Maker” is a web-based way to mix and match video clips into your own cinematic production. I kid you not. Try it out. You can add audio or music to add your own special touches. I doubt that any movie miracles will be made here, but the level of interactivity is completely off the charts. To get the full measure of this, you simply must see this movie.

So by now you must think surely I am done singing the praises of Te Puna Mātauranga o Aotearoa, but I’m not. There’s still more. Like I said, they’ve obviously been busy. The last thing I want to highlight is their National Digital Heritage Archive. Long in the works through a partnership with ExLibris, this preservation system went live on November 4. “The National Digital Heritage Archive (NDHA),” states the web site, “is the National Library’s technical and business solution to preserve and provide long-term public access to its digital heritage collections.” The NLNZ was the flagship partner with ExLibris, and the product is based on the Open Archival Information System (OAIS) model and conforming to trusted digital repository (TDR) requirements (which came out of joint RLG-OCLC work before the two organizations joined).

This is an incredible array of new initiatives by any measure, and a tribute to the leadership of Penny Carnaby, Chief Executive and National Librarian, and John Truesdale, Director National Digital Library, and of course many others who were instrumental in accomplishing all of this work. For my part, it’s hard to believe that it was only a bit more than a year ago when I was talking with Penny and John in a Melbourne bar after participating in a National and State Libraries Australasia strategic planning meeting. They have much to celebrate, as do we, since they have are doing much from which we can learn. I simply can’t wait to see what comes next.

William Gibson’s Agrippa and mal d’archive

Sunday, November 16th, 2008 by Jennifer

I didn’t come to Austin to get an archival jolt from a digital artists’ book. I’ve been at the Ransom Center this weekend attending a conference on literary archives and writers’ papers, “Creating a Usable Past.” I have never seen William Gibson’s 1992 artists’ book, one evidently well-known on the Internet. The cataloging notes say Agrippa has some photosensitive engravings and a disk holding the poem, “which may be displayed on a computer screen only once, and then is irretrievably encrypted.” Matt Kirschenbaum, professor at MITH, hacked the code of Agrippa and played it for us on a Mac emulator. Matt tells us his work will be up on the web in six weeks or so.

I was having something akin to Ted Bishop’s experience with the symptoms of archive fever. Ted is a Virginia Woolf scholar. In Riding with Rilke he describes the “jolt” of reading Woolf’s suicide letter. Yesterday morning the audience at the august Ransom Center was reading Agrippa on the big screen. The Mac emulator made it feel a bit like I was reading it in 1992. Back in 1992 I don’t think I knew what an artists’ book was.

Three of UT’s undergraduates have been blogging the conference at flairforarchives.

What could be more special?

Friday, September 19th, 2008 by Ricky

I’ve just read the minutes from a recent meeting of the Lot 49 group, which was formed to address issues related to moving image digitization. [Here’s a link to notes about the inaugural meeting in July 2007.] The need to be in Dublin, OH last week precluded my being there, but reading the minutes has led me to reflect on how motion and sound fit into Jen’s and my diatribe, Shifting Gears: Gearing Up to Get Into the Flow (about digitizing special collections for access).

Our major premise is that, in cases where we will preserve the original, we ought to think about digitization for access rather than for preservation. In this way, we can get more special collections digitized and accessible, thereby increasing the demand and, hopefully funding, for our collections. The alternative, investing in time-consuming expensive processes, risks special collections becoming marginalized in the midst of the vast quantity of books on-line.

By using the phrase “special collections” we meant to draw attention to digitization of non-book materials, but we hadn’t given a lot of thought specifically to motion and sound. One way in which motion and sound are different than other non-book formats is that the delivery of access copies requires a significantly compressed file, usually sacrificing a lot of quality. Another difference is that the premise that we would most often be preserving the original doesn’t always apply to motion and sound media.

The first objective, always, is stabilization of the content, then provision of access. With motion and audio, sometimes the original is digital (e.g., much current audio) and we can derive an access copy from it. If the original is in a stable analog format (e.g., preservation-quality film), then we can digitize for access. If the original is unstable and needs to be reformatted, there are two possibilities: a) when the best option is to reformat onto another analog medium (e.g., going from nitrate to safety film), we would subsequently create a digital access copy, or b) when the best reformatting option is digital (e.g., going from magnetic tape to digital audio), we’ll want to retain all the quality possible when digitizing, and then derive an access copy.

But let’s not get ahead of ourselves, a lot of motion and sound in our collections hasn’t even been cataloged. [Maybe the next round of CLIR/Mellon Hidden Collections grant funding should be inundated with proposals to describe hidden motion and sound collections.] Until we have a good sense of the nature and size of the problem, we won’t be effective in addressing it. [And if you have any ideas about how to survey backlogs, get in touch with Merrilee, who is launching a project to assess archival backlog survey methods.]

First describe ‘em, then stabilize ‘em, and then by all means, make them accessible.