Archive for the 'Digital Preservation' Category

Are born-digital archives still mystifying you?

Thursday, October 17th, 2013 by Jackie

Here at OCLC Research we’re always eager to find ways to ensure that our projects have strong positive impact on our professional community, and we’ve clearly hit the mark with our Demystifying Born Digital work agenda. I was reminded of this yesterday upon seeing the news that SAA’s Manuscript Repositories Section is going to reprise its 2013 “Jump In” initiative. What’s that, you say? Read on.

Ricky has been doing the heavy lifting on the Demystifying project, and both of her reports on managing born-digital archival content on physical media have been avidly received. “Finally,” say our archivist colleagues, “guidance that helps me get started rather than assuming I’m already an expert!” That’s exactly what we had in mind when we launched the project. Yay.

The first report, You’ve Got to Walk Before You Can Run: First Steps for Managing Born- Digital Content Received on Physical Media, served as the inspiration for “Jump In.” Here’s how it worked: Archivists were encouraged to follow the brief guidelines in the report for inventorying their born-digital holdings, and more than twenty did the work and submitted a report describing their methodology. The approaches taken varied a great deal and, taken as a whole, revealed that even this very first step in gaining control of born-digital holdings is not necessarily a trivial one. The “Jump In” reports then served as the basis for a panel, which Ricky moderated, at the Manuscript Repositories Section meeting at SAA conference this past August.

In fact, she was everywhere at SAA. Ricky also participated in a lightning round session on born-digital projects at the Research Libraries Roundtable meeting and spoke in a session titled “Defining Levels of Preservation and Management for eRecords.”

Only weeks before the SAA meeting, we released Walk This Way: Detailed Steps for Transferring Born-Digital Content from Media You Can Read In-house, co-authored by Ricky and our amazing Diversity Fellow Julianna Barrera-Gómez, which greatly expands upon the brief technical advice offered in First Steps, enabling archivists to move on to converting content and getting it into secure storage. We brought along 150 copies to the conference, and they flew out of our arms wherever we made them available.

And so, if your library or archives also needs inspiration to get started with managing born-digital stuff, have a look at Ricky’s reports and “Jump In Too/Two” in 2014! You might win the raffle for a free course from SAA’s acclaimed Digital Archivist Specialist curriculum.

Trust in Digital Repositories – best IDCC conference paper

Thursday, January 17th, 2013 by Jim

I am delighted that a paper titled “Trust in Digital Repositories” co-authored by my OCLC Research colleague, Ixchel Faniel, was given the best conference paper award at the just-concluded International Data Curation Conference in Amsterdam. Okay, she had help. Co-authors are Elizabeth Yakel (University of Michigan School of Information) with Adam Kriesberg (UMSI) and Ayoung Yoon (University of North Carolina School of Information and Library Science).

We can’t link to the paper because it hasn’t been published yet. However you will find the presentation slides embedded in the conference program that I linked to above.

The work described in the presentation looked at whether the actions stipulated as key to the audit and certification of trustworthy digital repositories were actually instrumental in creating trust in the designated community of users. Plain language – we said do these things and you should be trusted. Are those really the things that influence the repository users’ judgement about trustworthiness? And does that judgement differ by disciplinary affiliation?

I’m not going to spoil it. What do you think?

This work was based on the Trustworthy Repositories Audit and Certification checklist that OCLC Research published about five years ago. The Digital Curation Center itself has a nice page on the development of the certification checklist which goes back quite a long way. The Research Libraries Group had a lot to do with its origins thanks to my former colleague, Robin Dale.

It pleases me that this work has bridged organizations and colleagues. Shout out to Robin. Congratulations to Ixchel.

Elusive Quality

Thursday, October 25th, 2012 by Ricky

We talk a lot about data curation, but rarely about data quality. How do researchers determine if a dataset is appropriate for their intended purposes? They may need to know how the data was gathered (sometimes including the sensor equipment used and how it was calibrated), the degree of accuracy of the data, what null elements mean, what subsequent changes have been made to the data, and all sorts of provenance information.

The University of North Carolina invited about 20 people from a variety of communities to an NSF-funded workshop, titled, Curating for Quality: Ensuring Data Quality to Enable New Science. The final report has just been published. In its appendices are the white papers that were prepared in advance of the workshop, including one that Brian Lavoie and I wrote, titled, The Economics of Data Integrity, which is on page 53 of the report.

The most useful outcomes of the workshop came from the group’s brainstorming of projects that would advance the discussion. We settled on eight that seemed actionable and fleshed them out a bit. We were encouraged to pursue the projects that moved us, either by working informally with like-minded individuals or by making a proposal to NSF. There’s no reason, however, that anyone couldn’t take up any of these ideas.

For those of you in a hurry, the Conclusion and Call to Action on page 17 and 18 of the report sum up the issues quite nicely.

Born Digital for Those Born Analog

Thursday, August 23rd, 2012 by Ricky

The first outcome of an ongoing OCLC Research activity, Demystifying Born Digital, is a report, You’ve Got to Walk Before You Can Run: First Steps for Managing Born-Digital Content Received on Physical Media.

Jackie Dooley and I submitted a draft of this document to some of the smartest people we know (see the list here) and what ensued was one of the liveliest professional discussions I have ever been part of. We wanted to end up with a very simple list of actions for getting a born digital backlog under control. The assumed reader was a person without expertise and in an archives where there was no policy in place and little IT support.

Many of our advisors instinctively pulled toward best practices. Others acknowledged that even with their expertise and tech support at relatively well-funded institutions, they were not able to do even what we had been calling minimal steps. Some of the steps became optional; others got a bit of elaboration — even the order of the steps was the subject of much deliberation, but we ended up with a simple, direct document to help those who were afraid to take the first steps.

Tackling born-digital: First, take baby steps …

Thursday, May 12th, 2011 by Jackie

On occasion my stepdaughter Sunde hears me fret about a problem that promises to take what seems like an excess of time, energy, or brainpower to solve, and she sagely advises “Baby steps, Jackie, baby steps.” Her simple coaching tip serves me well!

So, what’s the #1 issue that archivists fret about these days that seems to require an excess of just about every resource a human can bring to bear on a problem? Born-digital archival materials, you say? Bingo. And I think Sunde’s advice could serve us well …

The survey data that we gathered in 2009 provides a very interesting picture of what is, and isn’t, going on in research library special collections and archives in the born-digital realm. I’ve given quite a few public presentations about the survey over the past year, and a slide that has gained a lot of traction says this:

Born-digital archival materials: In a nutshell … undercollected, undercounted, undermanaged, unpreserved, inaccessible.

We learned that most research libraries have at least some born-digital special collections materials (79%), but far fewer even know how much they have (35%). Half of the gigabytes reported are held by two (two!) institutions. Most (83%) need education or training. Only half have assigned responsibility to any organizational unit for managing these materials. In sum, we surmised that collecting is generally reactive, sporadic, and limited. Lots of folks feel frozen, not knowing how to get started on such a daunting new area of archival management. An ocean of literature documents a vast body of research and practice on electronic records, but is way too complex for most archivists to navigate.

After pondering all this, Ricky and I have launched a project that we hope will help our colleagues start moving gingerly forward. We’re tackling three issues: identifying the many types of expertise held by special collections curators and archivists that are relevant in the born-digital context; considering the issues that pertain for various types of born-digital formats to warrant involvement of special collections and archives experts in their management; and defining some of those baby steps.

We’ve had some terrific conversations with colleagues who are educating us about initial “do no harm” steps that they take to establish basic control of born-digital files. Just today Merrilee pulled together an informal meeting of colleagues from New York City institutions (all of them members of our OCLC Research Library Partnership) to talk about the challenges they face and solutions they’re starting to put in place. There were archivists, heads of special collections, digital library managers, preservation librarians, and IT experts in the room. The synergy was terrific as everybody recognized the range of professionals that must be at the table to identify and implement solutions to the born-digital dilemma.

What’s your advice? Get in touch and help us think smarter about it. Really. We’d love to hear from you.

OCLC Research 2010: Blue Ribbon Task Force on Sustainable Preservation and Access

Thursday, December 30th, 2010 by Brian

2010 marked the conclusion of work of the Blue Ribbon Task Force on Sustainable Digital Preservation and Access. Formed in 2007, the Task Force was an international group convened to examine the issue of economic sustainability in a digital preservation context. Membership included experts from across the digital preservation community, including the public sector, the private sector, cultural heritage, and academia, and reflected a range of expertise, including librarians, archivists, computer scientists, and economists.

The Task Force produced two substantial reports which, provide:

  • the first comprehensive study of the economics of digital preservation;
  • a clear definition of the conditions that must be met to achieve economic sustainability in a digital preservation context;
  • practical, actionable recommendations for achieving economic sustainability, based on detailed analysis of both the economic environment in which preservation decision-making takes place, and the attributes of digital preservation as an economic activity;
  • a list of priorities for near-term action;
  • a strong foundation to catalyze additional work on economically sustainable digital preservation.

In addition to publishing their final report this year, the Task Force organized a symposium in Washington, DC, with approximately 100 participants. The April symposium provided the community with a public forum to react to and discuss the Task Force’s findings and recommendations; to assemble panels of experts representing the four digital preservation contexts discussed in the report, and hear their thoughts on how the Task Force’s recommendations might be implemented; and to inspire ideas for future work, building on the foundation provided by the Task Force. A similar event was sponsored by JISC in London in May. Both events helped embed the Task Force’s work into the international digital preservation community. Additionally, the work of the Task Force was recognized by being included on the short list for the 2010 Digital Preservation Award.

Reimagining the Archive

Monday, November 29th, 2010 by Jackie

A couple of weeks ago the UCLA Film & Television Archive hosted “Reimagining the Archive,” a three-day conference that brought together archivists, scholars, artists, creators of digital humanities projects, and assorted others to hear about a wide-ranging array of digital initiatives. While there was a certain focus on the moving-image realm, the papers went far beyond. A few talks that have stuck with me:

Keynoter Rick Prelinger, speaking after the opening reception, was his usual feisty self. He called for film archivists to become activists in finding ways to lessen the intellectual property stranglehold on access to and re-use of moving-image content, in part by reducing the emphasis on commercially-produced content in favor of “ephemeral film” (his term). He also issued a call to defend the power of the original image, unpolluted by enhancements like sound tracks and voiceovers.

A panel on digital scholarship included some good stuff. The p.i.’s on the Sacred Samaritan Texts project (digitized Torah scrolls) at Michigan State modeled a nice approach to working in close concert with members of multiple user communities who helped them understand the documents and the ways in which both scholars and religious practitioners would approach them as both texts and artifacts.

Fast-forwarding from A.D. 500 to 21st-century art took us to Adam Lauder, a digital scholarship librarian at York University (Canada) who is building IAINBAXTER&raisonnE. He seeks to reinterpret the concept of the catalogue raisonné by using crowdsourcing to create a virtual exhibition, curation, and research environment. Lauder offered up the phase “ephemeral curating,” which I kind of like. (Hmm, “ephemeral” emerges as theme.)

Howard Besser focused on projects that are using visual segmentation to enable more granular analysis of moving-image content. His closing “four things to prepare for” make a pithy summation: users will want ever-smaller units of granularity and will expect segmentation from us, geo-referencing will be low-hanging fruit, crowdsourcing helps us do more for less, and metadata must be created during production.

In a panel on new tools and platforms, Sherri Wasserman from Thinc, an incredibly cool NYC design firm, demoed several projects that use personal mobile devices to connect people to content. She described archival materials as “powerful objects in space without personality” and showed techniques for bringing memory objects to life. Like Howard, she brought in geo-referencing multiple times. Her advice: find ways to place memories within the spaces with which your archives is convergent.

INA–the French national audiovisual institute–was the principal cosponsor of the conference, and Thomas Drugeon gave a fascinating overview of their activity to archive websites. INA and the BnF share legal responsibility to preserve the French web, with INA focused on sites with “audiovisual content.” They went live in February 2009, and in less than two years have harvested 33 terabytes (that’s after compression) of content. (There, a glimpse at what “at scale” is going to mean when everybody really tackles born-digital.) They currently crawl 7200 sites, and Drugeon emphasized that they can preserve only “traces” via periodic sampling; archives that preserve websites must make sure researchers understand this. Speaking of researchers, he said the web archive will “never” be accessible openly but rather via designated libraries. He couldn’t say why the law specifies this. (Something other than intellectual property rights? Er, maybe “never” is justified …)

Well, there was a lot more, but you get the idea. On Sunday morning Greg Lukow, chief of Motion Pictures (etc.) at LC, gave a whiz-bang ppt on the new Packard (yes, that Packard–they’re also building a facility for UCLA’s film and TV archive) Campus of the National Audio-Visual Conservation Center built out in Culpepper VA. Y’all go take a tour. Looks pretty fabulous.

Research dissemination and ‘the archive’

Monday, April 26th, 2010 by John

Ithaka S+R recently published its Faculty Survey 2009: Key Strategic Insights for Libraries, Publishers, and Societies. It considers the way faculty views of the library are changing, and analyses library roles into three key functions:

“The library is a starting point or ’gateway’ for locating information for my research” (which we refer to as the gateway function). “The library pays for resources I need, from academic journals to books to electronic databases” (which we refer to as the buyer function). “The library is a repository of resources – in other words, it archives, preserves, and keeps track of resources” (which we refer to as the archive function).

Ithaka’s analysis shows that the gateway function has declined (its importance rating has dropped from 70%-58%) over the six years in which the biennnial studies have been made, while the buyer function has steadily increased (81%-90%). The archive function has remained relatively static at just over 70%.

Many of the findings in this report are interesting, and relevant to us as we focus – via our Working Group on Research Services – on the specific topic of Support for Research Dissemination. We have chosen the word dissemination with some care. What we will be looking at is researcher behaviours and practices concerning institutional repositories, individual websites, subject archives, virtual research environments, blogs, blog aggregations and other social venues. In other words, every research dissemination venue except the conventional (and still overpoweringly influential) modes of scholarly publishing – the journal, the monograph and the conference paper. We will look at the way researchers use these alternative venues to disseminate their work, and the factors that account for the types and rates of dissemination. Read the rest of this entry »

The Cult of Brewster Finds Its Church

Tuesday, October 20th, 2009 by Roy

The Internet Archive's New HomeLast night Brewster Kahle of the Internet Archive unveiled his latest project in a venue suitable for any high priest or cult leader — a former Christian Science Church in San Francisco. As it turns out, the Internet Archive recently purchased the building, and as Brewster remarked during the grand unveiling of the Bookserver project, it even matches their long-time logo, which was selected on purpose to imply a physical library.

Although the mood in the great room of the church that eventually Brewster hopes to turn into a modern-day library reading room was more hallelujah-inspiring than anything, the day preceding had been more down-and-dirty technical. The two-day meeting (still going on as I write this), is more about AtomPub and identifiers than holy water and consecrated wafers, but all of it does take a certain amount of faith. Read the rest of this entry »

John R. Stokes, Imaging Innovator

Monday, July 27th, 2009 by Ricky

John R. Stokes passed away this weekend. This caused me to reflect on both his career and mine.

When I started at the Library of Congress in 1985, I was an early entrant into the library imaging scene, but John Stokes was already there. He captured some of LC’s huge photo collections, at that time putting them on videodisk, as part of the Library’s Optical Disk Pilot Project. Anticipating that LC would ultimately want digital images, he saved the digital intermediates. As CD-ROMs became the preferred medium, he was able to deliver those digital images to LC for a tiny fraction of the cost of recapturing them. He didn’t shy away from any original formats, whether slides, large glass plate negatives, or ungainly panoramic photos (for which he built an amazing transport system that captured and stitched together 8 foot long panoramas).

When I came to RLG in 1986 1995, John was already at work there, too, on the Digital Image Access project [19.1 MB PDF file] — where he was helping RLG members with the human side of imaging. He developed software to manage description of images and to provide access to them. He’s done work for NYPL, National Geographic, The Smithsonian, National Library of Medicine, and other museums, universities, historical societies, and cultural heritage organizations.

In the last couple of years, he and I talked many times about ways to increase the scale of digitization of special collections. I wondered if devices could be made to increase throughput for special formats in the way that the Internet Archive and Google had increased throughput for books. Once again, John was already most the way there. He had developed a capture station that could be used with a variety of robotic materials handling devices [PDF] for various formats: manuscripts, large reflective materials, transparent materials of all sizes (including film reels), post cards, and so forth.

His physics background and in-depth knowledge of color, lighting, and photographic processes allowed him to push the envelope in designing capture equipment. As happened with high-end digital cameras, if it didn’t exist and he couldn’t build or adapt it, he’d go to the manufacturer and get them to improve their equipment until it met his high standards. He devoted a lot of attention to the process, too. Software to keep track of the workflow, allow metadata input, perform image correction, facilitate quality control, and track technical data were a key part of any system he put together. He knew that while he could automate the capture, the workflow software would help to improve the human factor.

John’s concession to my plea for faster production of access images was to make the process quickly down-sample images to derive smaller images for web access, while making it possible to save an archival-quality image to storage. He learned long ago that while people may ask for a quick access-quality image, eventually they’ll want more.

John was open and honest with his customers, admitting when he was out of his depth (not often) and pointing out ever so gently when the customer was out of their depth (in my case, more often than I like to admit). His innovative approach and his commitment to quality put him squarely at the top of my list when I was asked for advice on imaging equipment or for a service provider. He was also a kind, genuine, and gentle man, always happy to talk, whether it was about “bidness” or his and his wife Bettye’s latest adventure.

I make it sound as if John ran a one-man show. He had the support of many others over the years, including several of his family members. His good work will be continued by them and other good people at JJT, Inc. under the expert eye of his son, John T. Stokes. Already they reassure us that, within a couple of months, the Stokes Imaging System for special formats will be in place for pilots at two RLG partner institutions.