Large-scale digitization of special collections: legal and ethical issues (part 2)

[In my last posting, I failed to note that Aprille Cooke McKay’s excellent presentation on third party privacy can be found in the symposium Wiki]

The symposium “Legal and Ethical Implications of Large-Scale Digitization of Manuscript Collections” had two illustrative case studies, one on third-party privacy, and one on evaluating copyright status of works in archival collections.

The first was a presented by Nancy Kaiser and Matt Turi (and the background materials can be found on the symposium Wiki). Nancy and Matt briefed us on three different collections, all containing materials that are in some way sensitive, and lead a discussion of whether scanning these materials would be advisable. It’s both unfortunate and appropriate that the materials presented are not available on the Wiki, because each collection does present particular challenges — if you saw the materials you would know what I was talking about. One collection contains public health case studies — it is filled with health information and other information that could be considered sensitive. Another collection contains references to personnel actions. The third collection is a diary that is perhaps too revealing about the lives that intersected the diarist’s. In each collection, it was not a matter of de-selecting a handful of questionable materials or omitting a series or sub-series from scanning. Substantial portions, or indeed the whole collection, are in question. Redacting the materials would be require more resources that the SHC (or any repository) has available. This is a moot point, since redacting the materials would render them useless for serious scholarship.

There was quite a bit of time for discussion by symposium participants. In each case, the audience cautioned that more time “aging” each of these collections was called for. From a scholarly perspective, these materials represent a treasure trove of information for 20th century scholars. There was a tension in our discussions — it’s fine to serve up these materials in the reading room to qualified scholars before the materials had aged. So if scholars get themselves to the reading room, they have access to the materials. Yet, many other qualified scholars are denied access to the materials because they cannot travel to where the materials live. Further, the assumption is that archivists can identify who is qualified to examine the steamy diary or the public health records, and who is not. I think this was something that made us all a little uncomfortable, even as “reasonable archivists.”

Another squirmer was how quickly we said to put these collections aside. Each collection was chosen for discussion because of it’s problematic nature, but I wondered how representative these collections are of the twentieth century collections in the SHC (or any of our collections)? Are 20% of the collections problematic in these ways? 80%? How much important primary source material must we age, and for how long? We were discussing putting some of these materials in the cooler for 70+ years, which would mean Vietnam War era materials would become available somewhere in the neighborhood of 2030-2045. In our roles as reasonable archivists, is that level of access reasonable? I am hearing the voice of the twentieth century scholar that I once was saying, “Um, excuse me.” (And indeed, one of the things the Extending the Reach of Southern Sources project heard loud and clear from scholars in their workshops: “Don’t forget about the twentieth century.”)

The second case study was presented by Maggie Dickson, and was appropriately titled “Due Diligence, Futile Effort: Pursuing Copyright Holders for the Digitization of the Thomas E. Watson Papers.” This is a great piece of work and I hope it is published soon (you can see some of the details in the Wiki). The SHC undertook an detailed analysis of a single collection, the Thomas E. Waston Papers. The results are stunning, but not surprising. The collection has approximately 8500 items, which translates into 3304 names. Bulk dates on the collection are 1880s-1920s, but the full range is 1873-1986. Analysis was done on the name represented in the collection in order to determine information that would be useful in determining copyright status, namely, the date of death. In the US, for unpublished materials this is the life of the author plus 70 years, which means that this year, works created by authors who died before 1939 are now fair game. (You start to feel kind of ghoulish when your work revolves around getting excited about knowing death dates.) From all this work (14 weeks worth of a full time effort), we learn that 18.4% of the collection is out of copyright, and that 33.32% is in copyright. For 47.55% of the collection, they were unable to make a determination. If you had some good tools, you could automate some of this analysis, but the end result means that a third of the collection is clearly in copyright, and you could guess that some good portion of the undetermined portion of the collection is also in copyright. If you could carry this work to its diabolically logical end and seek permissions before digitizing, you would be contacting thousands of people, some the rights holders themselves, undoubtedly many heirs. Denise Troll Covey from Carnegie Mellon University has presented compellingly about some work done to secure permissions for published works. The work is high effort and low-yield, even in an optimized situation. (See, for example, this presentation from the 2004 Spring DLF Forum (PDF). Working with published materials is far easier than chasing down unknown correspondents in manuscript collections and there’s no good way to optimize the work in some of the ways that Covey has done.

I am grateful that the SHC has undertaken this work; I claim that we now can firmly say, this is not how a reasonable archivist would spend his time. Instead, I think we should spend time coming up with reasonable and workable policies that will allow us to acknowledge the questionable copyright status of this material, and developing responsive and responsible takedown policies for materials that are problematic. At least that’s how I think reasonable archivists should behave.

I’ll continue this series with two panel discussions, one on ethical and professional issues, and the other on legal issues.

Merrilee Proffitt

Merrilee Proffitt is Senior Manager for the OCLC RLP. She provides community development skills and expert support to institutions within the OCLC Research Library Partnership.

2 Comments on “Large-scale digitization of special collections: legal and ethical issues (part 2)”

Merrilee says:

March 4, 2009 at 2:24 pm

Laura, thanks for leaving that comment. I am really looking forward to seeing the reports from your workshops, since I think we need more reminding about these scholars and their needs. As you point out, we could stay very busy (and be very productive!) in the nineteenth century and we can’t let ourselves take this easy way out.
Laura Clark Brown says:

March 2, 2009 at 6:25 pm

When we at SHC prepared the case studies, we deliberately chose materials that could not be blithely brushed off as perfectly acceptable for digitization. I wanted good discussion, and I did not want our repository to be perceived as overly cautious with contemporary collections or prudish on matters of sexuality. So we pushed the limits with the case studies on health, personnel, and sex and drugs. I think we were victims of our own success because the archivists at this symposium stunned me with their conservative responses to the questions of if and when we should digitize the “modern” collections. The personnel case study, in particular, prompted overheard gasps and exclamations from the audience… “You can’t digitize that.” “Don’t you have other [less risky] stuff you could digitize?”

I am deeply concerned with the idea that a repository like the SHC, which does have extensive nineteenth-century holdings, should play it safe and digitize just that older stuff. The SHC could digitize from now until 2109 without ever leaving the nineteenth century, but just because we could, does not mean that we should. A young historian in attendance at the symposium told me later that the thought of a seventy-year digitization embargo on contemporary materials sent shivers down his spine. His era is post Vietnam, and he studies women’s health and civil rights. If we cannot comfortably digitize contemporary collections before they have aged seventy years, he will need to cross his fingers for plenty of funding for trips to Chapel Hill. Are reasonable archivists okay with privileging the nineteenth century?

Comments are closed.