Efficiency and scholarly information practices

Tuesday, March 31st, 2009 by Constance

There is a good article* in the most recent issue of JASIS&T by a group of Canadian scholars who challenge James Evans’ controversial claim that the increase in online availability of research publications has resulted in more focused and narrowly concentrated scholarly citation patterns. Evans’ study (2008) was the subject of a previous post on the ‘narrowing prospective.’

Vincent Larivière, Yves Gingras and Eric Archambault present research findings that suggest that the dispersion of citations has actually increased over the past century.  According to their research, the range of literature cited in contemporary scholarship grows over time as a function of the total supply or availability of published research. The percentage of papers cited at least one time increases steadily as the body of literature grows and matures. They characterize the implications of these findings in fairly categorical terms:

All these measures converge to demonstrate that citations are not becoming more concentrated but increasingly dispersed, and one can therefore argue that the scientific system is increasingly efficient at using published knowledge.  Moreover, what our data shows is not a tendency toward an increasingly exclusive and elitist scientific system, but rather one that is increasingly democratic.

Larivière, F., Gingras, Y., & Archambault, E. (2009): 861.

I was struck by the authors’ references to the ‘scientific system’ of scholarly communication, since it connotes not only a methodical approach but also a set of norms and expectations about the progressive advancement of human knowledge. Read the rest of this entry »

“From the library”

Thursday, March 26th, 2009 by John

Richard Ovenden, Keeper of Special Collections and Associate Director of the Bodleian Library in Oxford, gave an OCLC Distinguished Seminar presentation in Dublin Ohio on Monday of last week. Merrilee has already blogged about it here. I notice that the webcast of the presentation is now available from the Distinguished Seminar Series page. He presented a fascinating perspective on the relationship of the real object to the virtual, as a library like the Bodleian develops both its digitisation programmes and its approaches to exhibiting and exploiting the real treasures it holds.

Among the issues he addressed was the role of the library as an authority. Libraries, he concluded, may have to continue to hold on to original texts because they represent ultimate sources of authority upon which scholarship depends: they are the places to which scholars go to check their references. Later in the day I participated (by videoconference) in a meeting with Richard in which he argued for a development of that authority role. The library need not simply be a passive recipient of the textual requirements of the scholars it serves. It can also take the initiative in suggesting to the academic community where it should consider developing its own research strategies. If a library is particularly rich in collections in certain areas, those areas should be mined by the academic community for their scholarship. This was an interesting take on library assertiveness from a scholarly perspective, and it will surely become a more urgent question as the role of the unique, through the place of special collections in academic librarianship, becomes stronger. What is the scope for research direction coming from the library?

More light was shed on this subject with reference to a much older library even than Oxford’s, in the BBC Radio 4 programme In Our Time on the Library of Alexandria, broadcast on Thursday 12 March, and now available as a podcast. Melvyn Bragg was joined for this discussion by three classical scholars – Simon Goldhill, Matthew Nichols and Serafina Cuomo. Here, in a discussion among non-librarians, there were a number of curiously contemporary references to ideas discussed by today’s librarians. This, for example, was the first time I have heard hybrid library used of the notion of a library which is a collection both of texts and of people – the scholars who exploit the text. Serafina Cuomo talks of the people as being part of the collection – an idea we are today moving towards in projects such as Digital Lives (led by the British Library) and, indeed, in the notion of what we might think of as the identified user, uniquely associable with a career, citations, expertise, publications, etc.

We are also told that the Library of Alexandria was an instrument of competition – it was pitted by Ptolemy against the Library of Pergamum, for example, and in its role in Greek imperialism it represented a form of warfare fought over knowledge rather than over territory and the bodies of soldiers. There are many Vice Chancellors and Principals in the UK who probably feel they are still recovering from the battlefield of the most recent Research Assessment Exercise.

The role of the catalogue was described. It was a meta-text which did more than simply inventory holdings, but also represented knowledge about texts and so served as a foundation for scholarship. In many ways, the catalogue was the form and symbol of the authority which the library represented. To be a scholar writing from the library was to have scholarly credentials which could not be impugned. In due course, publication records became the credentials, but in the digital networked age the very authority of these records can be hard to assign, as names flood the web and machines create listings and rankings. What persists is a need for authority – the recourse of the scholar in the battlefields of scholarship. Richard made this point in his closing quotation from A.E. Housman, who noted in a footnote correcting an error made by a fellow-scholar: ‘the arsenals of Nemesis are located in the recesses of the Bodleian Library’. And now in its e-spaces …

Museum Data Exchange: Asking the right questions

Friday, March 20th, 2009 by Günter

The logistical details of publishing the tools we have produced as part of the Museum Data Exchange Mellon grant continue to unfold in a slower fashion than I had hoped, but I am now fairly confident that you will find applications for download announced at some point next week – more when it actually happens!

In the meantime, the focus of our activity with the museum partners has moved from creating tools to analyzing the data they’ve shared while using them. We now have data from six institutions who have allowed us to harvest CDWA Lite XML records created with and shared through a combination of COBOAT and OAICatMuseum 1.0 (again, more as we release the tools), plus records from three additional museums who had other means of creating and sharing CDWA Lite XML at their disposal. A total of about 850K records are now sitting behind a firewall on an OCLC Research server, awaiting data analysis.

Our next big question is: how can we evaluate the data the museums have shared? While it uses the same data structure (CDWA Lite XML), all participants are aware that rules to populate that data structure with data content may vary considerably from institution to institution. Cataloging Cultural Objects is becoming a household name, but a good bit of the data shared probably predates the emergence of this data content standard, let alone its local implementation. What are the right questions to ask which would give the participating museums a sense of how well their records play with each other, both in terms of the institutional dataset as well as the aggregate resource?

A day in the life of a program officer

Wednesday, March 18th, 2009 by Merrilee

I came into the office today and discovered it’s the Day in the Life of the Digital Humanities. Now, I used to fancy myself a digital humanities type — I’ve marked up my share of oral histories in P3 (using emacs, of course), pondered the deeper mysteries of xlink, and written a perl script or two to move text from here to there. It was mostly in a support role — I worked in the library, but was working in an area that enabled digital humanities types. I went to Digital Resources for the Humanities, attended the bootcamp at the Center for Electronic Text in the Humanities, subscribed to Humanist, went to ACH-ALLC… The web is littered with my papers, emails, and presentations from those days.

That was so 1990s.I’ve moved on. I realize that while I used to refer to myself as a refashioned historian, I now think of myself as a refashioned digital humanities person. Life goes on. Now I’m a program officer working in OCLC Research. I still think of myself as working in support of the digital humanities, and spend a lot of time thinking about the shifting landscape, so while I can’t actually bring you a day in the life of the digital humanist, I can bring you a view on my day as a program officer.

Challenges in uniformity and uniqueness: Richard Ovenden

Tuesday, March 17th, 2009 by Merrilee

Richard Ovenden (Keeper of Special Collections and Associate Director, Bodleian Library, University of Oxford) was the most recent speaker in our Distinguished Speaker Series. [If you follow the link you'll find, in due course, a link to the session itself.]

Richard’s talk yesterday was centered around uniformity and uniqueness. Uniform resources (that is, books and journals, those things that are widely held) now have a shared set of tools for discovery and interaction. Google Books (which started as the G5 and has not expanded to the G23, or G28) first focused on basic logistics but has since shifted to economic issues. Richard wondered if Google will in due course acquire other digital libraries, or digital library resources, in an effort to expand the corpus.

Recent concentration of effort on uniform resources has lead to a new (or renewed) focus on unique resources, which Richard split into local unique (what’s held in the IR, university archives, research data…) and global unique (those materials that have global value, regardless of origin). As he was explaining this, I imagined a Venn diagram where there is an overlapping bit of materials that are both locally and globally unique.

Oxford has engaged in a variety of activities (tagging and markup, text mining) and have developed a range of business models (ranging from fully free to fully subscription access) around their unique materials. They are considering how to move forward in a digital age, with projects around personal digital collections, and how to deal with “hybrid” collections (those collections that are have both paper and digital components, which have an extent statement like “500 boxes of paper and two PCs”).

In an era of increasing digital (and increasing reliance on digital), Richard still sees that real materials still have value and even provide a lure. At Oxford, real materials are used in research training, in master classes. The opportunity to be in the presence of the original can be quite compelling: on relatively short notice, the Bodleian organized an one-day exhibit of the Magna Carta and drew a crowd of 800 (ore had to be turned away). As the Bodleian considers its physical future (the New Bodleian, or the Weston Library) with increased space for teaching, exhibits, and events based around collections that working with materials “in the flesh” is still important.

Returning to uniformity and uniqueness, as the flow of funding shifts from sameness to originality, he hopes we can use the scale we’ve developed with the uniform to develop efficiencies that can be applied to unique materials. It’s our role to maximize the exposure of our collections to scholarship.

What of the implications of Google? Google’s work has not been curated, and we need to be aware of what’s been left out. In light of Google’s efforts, the preservation and exposure of the unique is more important than ever. There are also questions about the future of the uniform. Who bears the costs of keeping physical print collections, for example?

I was struck by Richard’s observations regarding focus on unique materials. This is echoed in the recently issued “Taiga Provocative Statements,” which says (in part):

[In five years,] collection development as we now know it will cease to exist as selection of library materials will be entirely patron-initiated. Ownership of materials will be limited to what is actively used. The only collection development activities involving librarians will be competition over special collections and archives.

Our own Information Context document (written in 2007) is similarly oriented:

Within a generation the library’s information sources and delivery services will be largely virtual. Libraries will continue to provide direct access to physical materials but this will be very much focused on the special demands of their local constituencies. “Comprehensive” research collection building will be done by a very small number of institutions while special collections of the special or unique materials of research will be maintained and featured at many institutions.

[The emphasis in both statements is mine.]

While I don’t disagree with Richard about the continued primacy of the original item, not everyone has a Magna Carta to draw the crowds. It’s also important to make more proletariat (to use a Bill Landis term) collections accessible, and to recognize that the audience for global unique is, in fact, global, and that we can serve both local and global audiences through digitization. Then again, I’m reminded of the presentation by Lisa Berglund at the 2008 RBMS preconferencethat taught me that even “real” pedestrian collections are useful in instruction.

Interesting to note, the announcement about the Weston Library is dated today.

Repositories and library cultures

Tuesday, March 10th, 2009 by John

When is a repository not a repository? When it’s an OPAC? Are OPACs in reality a species of repository, however reluctantly, given that the genus is usually used with a specific application in mind – one which is a newcomer to the library world whose value is still not convincingly proven?

In the UK, JISC is about to award a tender for a study on The links between library OPACs and repositories in Higher Education Institutions. The invitation to tender states:

Repositories and OPACs … share various features and requirements. Both depend for their efficiency upon accurate metadata. Both provide a primary service to the home institution but also provide services to external users, for example in enabling access to content for a user from another institution. Various items of content may be accessible both through the library OPAC and through the repository, sometimes in different versions (e.g. a preprint in a repository and a published journal article under licence in an OPAC).

Its terms of reference include:

  • survey the extent to which repository content is in scope for institutional library OPACs, and the extent to which it is already recorded there;
  • examine the interoperability of OPAC and repository software for the exchange of metadata and other information;
  • list the various services to institutional managers, researchers, teachers and learners offered respectively by OPACs and by repositories;
  • make recommendations for the development of possible further links between library OPACs and institutional repositories, identifying the benefits of such links to various stakeholder groups.
  • Reading this reminded me that the University of Edinburgh has recently announced the introduction of an Open Access publication mandate. The Library will continue to run its Edinburgh Research Archive (ERA) open access repository alongside a new, closed, Publications Repository (PR), which will support research assessment and profiling. As the criteria for institutional deposit proliferate, the mandate document includes a FAQ section to answer researchers’ concerns. One is:

    What about research outputs which are not journal articles? The PR and ERA can accept most research output types including books, book chapters, conference proceedings, performances, video, audio etc. In some cases – for example books not available electronically – the PR/ERA will hold only metadata, with the possibility of links to catalogues so that users can find locations….

    Oregon State University joins Flickr Commons

    Monday, March 9th, 2009 by Merrilee

    This is a little late in coming but it’s not escaped our notice that Oregon State University (a member of the RLG Partnership) has joined the Flickr Commons (Günter has blogged about the Commons numerous times, most recently here).

    Archives Next has an with Tiah Edmunson-Morton from OSU about the her experience as a Commoner. As the interview reflects, the process was far from turnkey, but the project has clearly had some immediate impact:

    It’s only been 2 weeks, but I think I can safely say that putting the same Williams images in The Commons has resulted in dramatically different statistics! After 5 days, we had over 13,000 views, over 200 people add us as a “contact,” nearly 50 comments, and lots of tags. At the end of the 2nd week, with no new content added, the views jump to 24,500, 275 contacts, and lots more tags. And this is all on 116 photos! We’ve received good publicity, including a front page feature on the Oregonian’s “O!” section and a Wired Campus interview , which undoubtedly added to jumps in interest over the past few days.

    Nice going, Oregon State. And thanks to ArchivesNext for the interview.

    Large-scale digitization of special collections: legal and ethical issues (part 4)

    Friday, March 6th, 2009 by Merrilee

    This posting concludes my summary of the symposium, The Legal and Ethical Implications of Large-Scale Digitization of Manuscript Collections. I’m sorry the postings have been so spread out, but work in real time gets in the way of the retrospective blogger. As the posting lengths reveal, there was a lot of meaty content and discussion at this meeting.

    In addition to the panel on ethics, I also moderated a panel on legal issues. Again, the panelists were insightful, thoughtful, and kept to time! I couldn’t ask for better colleagues.

    Peter Hirtle (Cornell University) gave an update on orphan works and Section 108. Orphan works refers to work that are in limbo because an owner cannot be located, and there is proposed legislation that covers how they are to be handled. The legislation has been tabled time and time again. And what about proposed revisions to Section 108? Orphan works were viewed as the most pressing problem (and relatively easy!), so Section 108 revisions will follow orphan works which was seen as the most pressing problem. Peter noted that Section 108 was originally an access exception, which has morphed into a preservation exception.

    “Since Peter tells us it’s not a good time to go to Congress, let’s try the courts, in looking at fair use!” Before the meeting, Laura Clark Brown had posed the question, is it possible for an item in a digitized collection to be considered “transformed” by virtue of the context that the overall collection brings to it? (Recall that transformation is one of the “four factors” that are considered in assessing whether use of a work under copyright is “fair.”) Mary Minow put this question to the audience (with us serving as jury) after presenting some background on the concept of transformation and transformation use in case law. Following the discussion and vote, it turned out (not surprisingly) that many archivists took the view that the collection provides context that is transformative. I’ll just observe that archivists love their context…

    Heather Briston (University of Oregon) looked at what constitutes “publication.” Heather started out by noting that where something is formally published alone does not in and of itself add up to an infringement. Frequently we are trying to look for actions that would cause something in an archival collection to be considered “published.” In Heather’s analysis, you will be sued over infringing, not over publication status. (If I’m understanding this correctly, worrying about publication status is a red herring in the context of digitizing collections.) The discussion was mostly around whether theses and dissertations are published or unpublished. This was not particularly germane to this topic of the symposium, and the conversation revealed a range of institutional opinions.

    What constitutes due diligence? Sharon Farb (UCLA) started with the legal definition: “a measure of prudence, activity, or assiduity, as is properly to be expected from, and ordinarily exercised by, a reasonable and prudent person under the particular circumstances; not measured by any absolute standard but depends on the relative facts of the special case.” The definition is not very helpful, because it “depends.” Another factor that comes up in these discussions is that actions are taken in “good faith,” another concept that hinges on conditions. However, the definition of what constitutes bad fair — showing dishonest conduct — is instructive. Not making an inquiry does not necessarily constitute bad faith. How much due diligence is due is question akin to how much metadata is needed. Archivists and librarians want to have concrete rules, which is not usually practical. Some guidelines: you need to satisfy yourself (and recognize that for different institutions this level of satisfaction will be different); recognize that risk assessment is critical and needs to come first (think of it as step 0 in a multi-step process); documentation and recording rights metadata in ways that can be shared and repurposed is very important. The Watson project, for example, reveals a process that is beyond diligent. Look at tools and metadata that are available and move on. Employ sampling techniques and pull together a representative group to provide documentation on effort.

    How to managing and balancing risk factors? Bill Maher (University of Illinois, Champaign-Urbana). In general, we take risks all the time — riding bicycles, investing money. These days even eating a peanut butter sandwich is a risk. Many of us take calculated risks with the law all the time (speed limits are a good example). In terms of reducing possible consequences of your actions, copyright sections 411, 412, and 504 are useful (reduced consequences for unregistered works). Online notifications can be helpful in reducing risk. Copyright law is case specific, worked out by litigation. If you reduce risk and are never sued, this does not remove the fact that what you are doing is infringing. You need to consider the messages that you are sending through your actions — to past and future donors, students, employees — that you take property rights lightly. Community guidelines, transparency, working with donors can all help to counter such perceptions. Appraisal of materials: acknowledging that different types of materials, ages of materials are all important factors in assessing risk. Another area of focus is to change the law — copyright law is bad, and we should get the message out that it’s not working. However, if we don’t take risks and digitize materials, we are taking risk of a different sort — not making materials as broadly accessible as technology allows.

    Rights management – Sharon Farb. Sharon put in a very brief plug for the chapter on rights metadata written by Maureen Whalen (Getty) in the new Introduction to Metadata
    . Echoing what was said earlier, it’s important to document and record what you find when investigating rights. (And while we’re plugging, I will note that OCLC’s Copyright Evidence Registry provides a place to record metadata in ways that can be shared and repurposed — it’s freely available, so check it out.)


    Important to keep in mind the important role of advocacy in all of this — we’ve been discussing ethical, legal, policy issues, but advocacy issues are also important. Too bad there wasn’t a panel on creating advocacy opportunities.

    Many thanks to Laura Clark Brown, the unsung heroine in my postings. Laura is the genius who pulled this whole conference together and asked me to moderate the panels on ethical and legal issues. Thanks, Laura, for including me in such a special event. My only regrets are that I didn’t have a chance to meet two people who had been invited to the symposium, but were not able to attend: Kevin Smith (author of the Scholarly Communication @ Duke blog) and Lisa Carter from North Carolina State. I hope to meet you sometime soon!

    The Future of Books, Publishing and Libraries – an AAAS panel

    Friday, March 6th, 2009 by Jim

    My conclusions from attending the panel were:

    The book isn’t gone it’s just different.
    Newspapers are dead.
    Publishing models are in the midst of change; success will depend on adding new kinds of values.
    Libraries won’t go away but they will be a different bundle of services lodged in a changing physical place.

    The best comment during the Q&A:

    “Starbucks succeeded because it provided a place for digital reading.” – Dan Clancy

    The most provocative question with the most unsatisfactory answer:

    “Why should there be more than one library?”

    The Babbage Difference EngineThe American Academy of Arts and Sciences sponsored a weekend symposium here in Silicon Valley titled The Public Good: The Impact of Information Technology on Society. The closing panel was on Sunday morning at the Computer History Museum (CMH). (We got a private tour after the symposium but I didn’t get to see the Babbage Difference Engine in operation. Sigh) Read the rest of this entry »

    Bitter fruits of the RAE?

    Thursday, March 5th, 2009 by John

    In December I posted an entry about the results of the Research Assessment Exercise in the UK. Today comes the news that the funding allocations which follow the assessment have been made for England (Scotland, Wales and Northern Ireland will follow later). The picture is a mixed one, as reported in The Guardian.

    The exercise is a zero-sum game, so demands that there be winners and losers. It has sought to have high principles (eg rewarding research excellence wherever it is found) but nonetheless to accommodate a general trend towards the concentration of research excellence, and to protecting science, engineering and medicine. It can’t do all of these things at once, and it seems likely that there will be a lot of unhappiness at the outcome, both in the top-tier Russell Group, which has had to lose some of its funding to allow the Government to deliver on its promise to reward pockets of excellence in the rest of the sector, and at the next tier down, in the 1994 Group of Research Intensive Universities, where the need to protect science, engineering and medicine means that the rewards of their efforts will be meagre. What the impact on library budgets will be remains to be seen.