Over the past few years, OCLC Research has done quite a bit of analytic work based on what my colleague Brian Lavoie refers to as “supply-side” data. Examples include the well-known Google 5 study, as well as a variety of projects examining the library long tail, several of them summarized in an article Lorcan published some time ago. Much of this work has been based on data aggregated in the WorldCat bibliographic database. These data have been contributed over many years by OCLC members to support a variety of shared library services, including cooperative cataloging and inter-lending operations; as a secondary effect, the aggregation has provided a rich source of information about the system-wide library collection that is regularly mined in both internal and extra-mural research projects.
More recently, we have begun to think about how we might make better use of the demand-side data that is generated by a variety of routine library operations, especially circulation and inter-lending. Lorcan in particular has given thought to how “intentional data” might usefully shape library service provision.
Inter-library loan transactions are a particularly interesting example of intentional data, I think. First, because while local circulation of academic print collections in North America is generally observed to be on the decline (a trend is borne out by a decade of ARL statistics), inter-lending traffic in this same population is increasing. We discussed these trends some years ago at a program organized by OCLC Research.
Secondly, aggregated ILL data provide us with a view into the system-wide demand for titles for which local fulfillment options are inadequate. This is important in the context of local and system-wide planning, since it reinforces the value of locally managed inventory in the larger library network: local demand is not the sole measure of a collection’s function or worth. Finally, inter-library lending is a heavily subsidized activity for which the direct costs are rarely appreciated or quantified. As the volume of ILL transactions increases, and local circulation diminishes, we can expect to see closer scrutiny of the “ROI” on operations that support an external rather than internal constituency.
There is also a pragmatic reason to be interested in ILL data, and that is that a considerably quantity of it is readily at hand as a byproduct of the millions of ILL requests handled each year by WorldCat Resource Sharing. I will confess that I do not know what proportion of library inter-lending traffic passes through WCRS compared to other channels; presumably the rise of direct consortial borrowing arrangements and reciprocal borrowing agreements has meant that some part of the inter-lending economy is managed regionally or within institutional peer groups. The Association of Research Libraries reported a total of some 4.5 million items loaned and 3.5 million items borrowed by 113 university libraries in 2007/2008, which provides at least some measure of the activity in this sector. A recent article describing the highly successful BorrowDirect lending partnership, which includes 7 ARL members, reported that participating libraries collectively received and filled more than 140,000 loan requests in 2008. These figures provide a backdrop against which the portion of inter-lending activity reflected in WorldCat Resource Sharing may be assessed.
As an experiment, we extracted from the WorldCat Resource Sharing data repository a record of inter-lending requests placed by 68 research institutions between 1 October – 30 November 2009. Late autumn is generally a period of high activity in university libraries, as students race to complete (and sometimes to begin) writing and research projects assigned during the first part of the academic year. The cohort of institutions included represents a subset of the total ARL membership. Our initial data-set amounted to more than one hundred thousand transactions; by limiting to transactions for “returnables” (i.e. books and other materials that must be returned to the lending library), we reduced this approximately 74,000 transactions representing about 71,000 unique titles. Most of the titles were requested only once; about 3% were requested more than once and a small number of titles (6) were each requested more than 5 times by members of the research library cohort between October and November 2009.
It is useful, I think, to think of the titles represented in our resulting data-set as titles for which extant local and group inventory is inadequate — that is, local demand can’t be met with local supply, because the title either isn’t owned or isn’t currently available (is checked out, on reserve, has been lost, etc) during a period of relatively intense demand. An obvious question one might ask is whether these requests might be fulfilled through alternative means, at a lower per unit cost. ILL has traditionally functioned as a means of lowering the transaction costs of sourcing material for which a long-term local investment in ownership is not feasible or desirable, sometimes referred to as a opting for “access over ownership.” In the age of mass-digitization, however, one might reasonably ask if content previously sourced from partner libraries might instead be delivered directly from large-scale digital aggregations in the cloud.
As a side note, I will observe that inter-lending partnerships like Borrow Direct were explicitly intended to reduce the transaction costs associated with traditional ILL. Borrow Direct has successfully reduced the per-transaction cost for its members to less than half the average and is generally (and rightly) regarded as a model to be emulated. The question, I think, is whether a model of collaborative resource sharing based on a very large and distributed regional inventory with considerable redundancy in local holdings is scalable in today’s library environment. Especially as local circulation rates fall, and as individual libraries seek to move collections into lower-cost high density facilities from which on-demand print delivery comes at a relatively high price, it will be necessary to re-evaluate what part of the aggregate print collection is amenable to the informal governance and distributed management model that underpins many resource-sharing partnerships, even highly successful ones like Borrow Direct. Going forward, cost-effective management of legacy print collections will likely require new organizational structures that are accountable and responsive to system-wide demand dynamics, including the progressive shift toward elastic, scalable digital provisioning options.
To understand what part of current academic inter-lending demand, i.e., demand for print books that cannot be met with local inventory, might one day be directed to large scale digital providers, we evaluated the match rate between our ILL data and the mass-digitized book corpus. We tested the 71,000 titles in our set against a June 2010 snapshot of the HathiTrust digital library and found that 17% of the titles requested via ILL (during the Oct-Nov 2009 period we examined) are represented in the mass-digitized corpus. This strikes me as a reasonable, and even impressive figure. [Our previous investigations have suggested that about 30% of the titles owned in individual academic libraries are represented in the digitized Hathi Library collection.] Of course, titles represented in the mass-digitized book corpus are not always available for download or onscreen reading; our current estimate is that 16% of titles in the Hathi collection are in “full view.” A little more than 500 (~1%) of the titles in our ILL sample are currently available as full-view content in the Hathi repository. If you figure that each inter-lending transaction costs an average of $20 (this estimate is based on 1992 survey data; costs are significantly lower in partnerships like BorrowDirect), one could say that in the two months from 1 October to 30 November 2009, research libraries collectively paid more than $10,000 to borrow books that might be sourced in digital format at little or no cost, directly from Hathi or Google.
This is, admittedly, a pretty small number when placed in the context of total library spending; in 2007/2008, aggregate ARL library expenditures topped $3BN. Nevertheless, as the mass digitized corpus continues to grow and as the relative proportion of public domain titles increases (up from 12% to 16% of titles in Hathi quite recently), it will surely become more important to weigh the costs and benefits of traditional print based inter-lending operations. As a thought experiment, one might extrapolate from our initial (and very preliminary) data and speculate that if 1% of all ARL borrowing were available as public domain digitized content, the total savings in inter-lending expenditures could be quite significant.
Let’s do the math. ARL inter-lending figures do not distinguish between returnables and non-returnables, but we might estimate (based on our WCRS sample) that 70% of the reported requests in 2007/2008 were for books rather than journal articles.
3.5M requests * .70 = 2.45M requests
Now let’s assume that 1% of those requests are available as public domain digitized content, and apply the $20 per transaction cost estimate:
(2.45M requests * .01) * $20 = $490,000
Nearly half a million dollars in research library expenditures per year for a traditional print based operation that might be saved. Of course, not every demand for print can be met with a digital surrogate; but in many cases, I would argue, providing a digital-first option could help mitigate the increasing demand and cost of meeting extra-mural demand for locally-owned inventory. As universities look to contain costs and libraries seek to embrace more locally responsive service portfolios, it will likely become harder to reconcile the costs of subsidizing print delivery services for an external clientele.
Further study of aggregate demand data will help to identify which parts of the locally-organized print management operation in libraries are amenable to externalization. Our initial examination of WCRS transaction data suggests that emerging network service providers like the HathiTrust will play an increasing important role in a progressive out-sorting (sic) of library operations.
Constance Malpas is a Research Scientist at OCLC. Her work focuses on data-driven analysis of library collections and services, with a special emphasis on strategic planning and managing institutional change. She has a particular interest in the organization of knowledge and research practices in the sciences.