Archive for the 'Systemwide Organization' Category

Sliding scale: mapping local, group and system-wide library infrastructure

Sunday, July 28th, 2013 by Constance

Recently, we looked at how Sankey diagrams might be used to visualize the flow of library resources within and across inter-lending networks. It was a useful exercise, but it left me feeling that a critical dimension was lacking: a measure of the geographic distance between inter-lending partners. Understanding that a significant share of the inter-lending demand that is fulfilled by CIC libraries is generated outside the CIC group is significant in its own right, but it doesn’t tell us much about the relative costs of serving ‘in-group’ and ‘out-of-group’ partners. If most of the non-CIC borrowers are located within close proximity of CIC lenders, the costs of fulfilling returnable requests (which must travel to and from the borrower) will be less than if the non-CIC lenders are located farther away. I tried mapping the Sankey flows to ZIP codes, but unless one is already familiar with the codes, it is fairly difficult to visualize the distance covered.

The obvious solution was to plot the outbound and returning flows on a map. At first, I did this by dropping markers for the borrowing and lending partners on a map, using a simple Web application (BatchGeo) that uses the Google Maps API to generate map-based data visualizations. BatchGeo is a nice tool, but in this case the result wasn’t very pleasing — the density of same-sized location markers in some locations made it difficult to read and, more importantly, obscured patterns in the relative concentration and diffusion in different regions. This was particularly true in comparatively small states. It was a very noisy picture.  Even if one looks at a small fraction of the inter-lending partners, the result is an irritating blur.  Limiting the inter-lending population to the top 5% of borrowers resulted in this not especially informative picture:

Top 250 CIC borrowers by location

Highlighting the borrowers located in the ChiPitts megaregion made it only slightly more interesting:

Top 250 within and outside ChiPitts

By chance, as I was experimenting with these maps, Jim Michalko (my boss) stopped by my desk to chat about a recent article in the New York Times on the geography of economic mobility. He half-jokingly suggested that we overlay a map of North American library infrastructure on the map of economic mobility, to see if there was any correlation between the availability of library services and the likelihood that individuals can better their economic lot in life. Well, why not? I already had a geo-coded set of US libraries — all I needed to do was map those to pre-existing shape files to produce a county-level view of library infrastructure. I used a freely available ZIP code data table to map ZIP data to county-level boundaries. The mappings are not perfect, but I considered them good enough for my purpose — which was not to produce an exact map of all library locations, but simply to compare the relative density of regional library infrastructure. With this in hand, I could use a method outlined by Robert Mundigl in his Clearly and Simply blog to associate data values with colored fill gradients in choropleth maps, using Excel.

Here is the result:

County-level distribution of WorldCat libraries in the United States

County-level distribution of WorldCat libraries in the United States

It is not a complete map — I wasn’t able to map every OCLC library symbol in the United States to a valid ZIP-based county, and not every library in the US has an OCLC symbol. Still, with nearly 30 thousand libraries, it is more comprehensive  than a map produced earlier this year based on IMLS data for about 17 thousand public libraries.

The first thing to be said about this map is that it does not suggest that there is any obvious correlation between the concentration of library resource (infrastructure) and economic mobility. Several of the areas that authors of the Equality of Opportunity study highlight as places where children of low-income families have a relatively greater likelihood of rising in the income distribution have comparatively limited library infrastructure. Admittedly, the geographic unit of measure in the two maps differs — I used counties (partly because they are readily available as shape files), while the researchers used commuting zones. It’s not obvious to me that if the library data were aligned to commuting zones, the picture would look much different: our data suggests that there is comparatively little library infrastructure in the upper Northeast zone of Nebraska, whether one relies on county or commuting zone boundaries — yet, this is an area where inter-generational income gains are relatively frequent. Conversely, in metro areas like Chicago where library infrastructure is comparatively dense, there is reportedly a pretty low level of inter-generational income gain.

Of course, to judge the strength of the US library system based on the geographic distribution of libraries alone is to overlook a vital — perhaps the most vital — attribute of the library enterprise. Libraries are in the business of increasing access to information by sharing resources that are distributed across broad networks of related institutions: public libraries, academic libraries, special libraries etc. Libraries are part of what is now fashionably termed the ‘sharing economy.’ To measure the integrity or vitality of the library system, one needs to take into account the efficiency of flows across the library network.  A successful library system is one that ensures that a child (or adult) in rural Nebraska has access to the same collections and services as a child (or adult) in Minneapolis or Seattle.

It would be interesting to investigate whether geographic areas that favor economic mobility are co-extensive with areas where library flows — the balance of supply and demand for library resources — are notably efficient. Perhaps some LIS PhD student will take up the challenge. My current objective is a lot more prosaic: modeling supply and demand within and outside of a given library consortium to inform decisions about local and shared stewardship of print collections. For this, I think the county-level choropleth is actually quite useful. It helps to show how demand is distributed at ‘above-the-institution’ scale, and this is important for understanding the role of logistics in optimizing the flow of library resources.

This is what a county-level heatmap of demand for CIC returnables looks like, reflecting cumulative inter-lending request activity over a period of about seven years:

Percent of CIC Returnable Borrowing by US Counties

What does this map tell us? A few important things are immediately discernible:

  • CIC libraries (which are mostly located in the Midwest) serve institutions across an enormous geographic range in the United States.
  • Regional demand is concentrated in a relatively small number of counties.
  • The relative volume of demand is quite low; counties with the greatest number of request transactions individually account for less than 7% of total demand.

Some of this information was equally visible in the institution location map that I had started with — but the county-based version is less noisy and enables me to roll up a great deal more data (about 1.3 million transactions and five thousand borrowers) in a single picture.  It also raises some additional questions:

  • Are the ‘hotspots’ of demand an artifact of aggregating demand over several years?  I.e. did all of the demand from Southern California occur in the last twelve months or is it a recurring pattern over many years?
  • Has the geographic range of demand changed over time?  Are CIC libraries a broader range of institutional partners today than they did five years ago?
  • Do individual CIC member libraries serve a comparable range of institutions?  Is ‘long-range lending’ associated with libraries that hold materials that are relatively scarce in the overall system, or are materials traveling farther than is necessary to meet demand?

It was easy enough to plot annual demand for an individual lender, so I produced a new series of maps looking at the county-level location of institutions who borrowed returnable items from the Ohio State University Library (symbol OSU) over a few years.  This time, I used the absolute number of loans as the input, so that even low-volume borrowers would be visible.  Here are the results for:


Locations of borrowers from OSU in CY2008.

Locations of borrowers from OSU in CY2008.


Locations of borrowers from OSU in CY2010.

and 2012

Locations of borrowers from OSU in CY2012.

Locations of borrowers from OSU in CY2012.

At first glance, they might appear to be the same map…yet there are minor variations from year to year. The consistency in regions of demand is interesting, since it suggests that there is some predictability in the sources of demand — year after year, institutions in Southern California (mostly Los Angeles County) and Southern Arizona (mostly Pima County) have turned to OSU as a supplier one hundred times or more.  Why does this matter?  A pattern of sustained demand might suggest that a subscription based pricing model would benefit partners on both sides, providing predictability in budgeting, and  also provide OSU with documentation of the continuing value its holdings are producing for other institutions (and geographies).

The variations in demand are equally interesting.  For example, demand for OSU resources from institutions in the Pacific Northwest seems to have waned somewhat — could this be a result of improved intra-regional inter-lending arrangements and courier service within the Orbis-Cascade group?  Less visible in these pictures, but no less intriguing (I think) is the decreasing range of geographies served by OSU between 2008 and 2012, which amounts to a reduction of 12%.  This trend holds up over a longer period, not reflected in the maps above.  What might account for this change?  Is demand being deflected to other suppliers? Or is the increasing volume of requests generated by ‘in-network’ CIC partners displacing fulfillment for non-CIC institutions?

There’s a lot more to explore in these complex inter-lending networks — and I suspect that visualizing flows between institutions and across geographies will become increasingly important in monitoring, analyzing and improving efficiency in the library system as a whole.  OCLC makes an increasingly wide range of data (including ILL policy and institution data) available for programmatic use by developers and others, and this will hopefully lead to more experimentation with visualization and library analytics.

Visualizing Network Flows: Library Inter-lending

Friday, July 12th, 2013 by Constance

Sankey intra-CIC flows

As part of our joint research project with the CIC Center for Library Initiatives and the OSU Library, we’re examining inter-lending flows within and outside of the 13-member CIC consortium. We are using a subset of the OCLC WorldCat Resource Sharing (WCRS) transaction data archive for this analysis. Our current data-set comprises 1.33 million request transactions, representing nearly 900,0000 individual titles loaned by CIC libraries over the past several years.

As Max Klein has noted elsewhere in this blog, the OCLC Research group is starting to experiment with new approaches to data visualization using the R statistical modeling environment and D3 JavaScript code library. (The WorldCat Live prototype is a nice example of how these are being put to use.)  We are eager to integrate some of this experimental work into the ongoing CIC analysis.

Bruce Washburn, Brian Lavoie and I met recently to look at examples in Mike Bostock’s D3 gallery, which provides some great examples of how different visualization techniques can be implemented.  We were looking for examples that were fit to purpose for this particular project.  Since inter-lending is a library-centric approach to balancing supply and demand, Brian suggested that we focus on examples that are particularly expressive for modeling import/export flows. We settled on force-directed graphs and Sankey flow diagrams as good candidates for further exploration. And because we are especially interested in understanding the flow of library resources across geographies, we decided it was worth some additional work to enhance our WCRS transaction dataset with geo-codes, so that we can experiment with mapping flows across regions.

From his experiments with TopicWeb, Bruce has developed some facility with D3 and he is now doing some work with R. But before we run head-long into any new development work, I wanted to do some low-level experimentation to see if the data we have in hand, and the questions we are trying to answer, lend themselves to visualization in Sankey diagrams.

When Brian, Bruce and I met to discuss models, we spent a fair bit of time discussing this diagram of horse import/export activity in Europe. It wasn’t until this week that I realized it had been produced by the prolific blogger and Open Data advocate Tony Hirst (aka @psychemedia, whom I’ve followed on Twitter for a long while) as an experiment in formatting data for Sankey diagrams. His blog post on this topic is great — unfortunately, it’s way beyond my current skill to implement. But in the comments, I noted that Bruce Mcpherson had developed some VBA code that uses Excel as the data input to a D3 Sankey library. This was just what I needed for some quick experimentation with our current data set.

Here are a few illustrative screenshots:

The three-letter symbols correspond to OCLC institution symbols for the 29 CIC collections we are examining in this project.

Sankey sample outbound CIC flows

Another, showing the breakdown of CIC borrowing of CIC returnables:

Sankey sample CIC flows with detail

And a third, this time with some detail for both Non-CIC and CIC borrowers — NB the number of non-CIC borrowers makes it difficult to represent them all in this format, hence the block of ‘others.’

Sankey sample CIC flows with detail non and CIC

Now, these are admittedly primitive pictures of how resources flow out of CIC libraries and into other places — but they do capture some important attributes that we are interested to explore further.  For instance, it is immediately apparent that there are some major ‘sources’ and ‘sinks’ for CIC returnables. And it’s clear that while the demand generated outside the CIC is significant (greater than the demand generated within the consortium), it is extremely diffuse — spread across a population of thousands of libraries.  Both of these are important for understanding for how existing library flows can be optimized. As we refine our analysis, we’ll be examining what factors are driving demand to particular libraries:  proximity of lender, scarcity of alternative supply options, price incentives, efficiency of service (as measured by turn-around) , etc.  And we’ll be looking at new ways to use data visualization to explore — and share — interesting and important patterns in the organization of the library system.

Update:  Thanks to Tony Hirst’s comment (below) and some subsequent Twitter exchanges with @timelyportfolio,

there is now a very nice tutorial on creating Sankey diagrams using rCharts and d3.  Many thanks to klr and Tony for taking the initiative.

Concentration, Diffusion, Centers & Flows

Tuesday, April 30th, 2013 by Constance

Our 2012 mega-regions analysis revealed a notable feature of the library landscape in North America:  the apparent  scarcity of print inventory varies significantly depending upon the scale at which it is assessed.  It stands to reason, of course, that a title held by a single library in one locale may be held by many libraries in other places.  What is more surprising is that scarcity – or what is more appropriately termed diffusion – is a characteristic that persists even at the scale of the mega-region.  In every one of the 12 regions we examined, more than 75% of the print book titles are held by five or fewer libraries.  Yet, comparing one mega-region to another, we found a high level of bilateral duplication:  for example, more than 70% of the print book publications held in Cascadia – a mega-region encompassing urban centers in Oregon, Washington, and British Columbia – are duplicated by library holdings in the NorCal region.

Bi-lateral duplication of print books in Cascadia and NorCal

Bi-lateral duplication of print books in Cascadia and NorCal

Now it may be the high degree of integration that is a primary characteristic of mega-regions is also a factor in the diffusion of library resources within those same regions.  Arguably, the strong networks of exchange and robust logistics infrastructure of mega-regions help to explain why we find such a low level of redundancy in library collections within those regions.  Even without coordinated collection development plans, it may be that the ease with which resources (including library books) flow within mega-regions exercises some influence on library acquisitions.  The incentive to acquire ‘just in case’ inventory will be less in a region where inter-lending networks are strong and it relatively easy to obtain copies from neighboring institutions, and this confidence in regional supply options may have a sort of invisible-hand effect that constrains redundant acquisitions.

This would suggest that the high degree of diffusion we see in regional collections – inventory distributed across a number of geographically distant institutions – is a characteristic of a ‘well-organized’ (though not deliberately engineered) library system.  By contrast, library collections in institutions located outside of mega-regions tends to exhibit a higher degree of redundancy.  This is partly a reflection of the large number of libraries that fall outside of the defined mega-regions – more than nine thousand individual OCLC institution symbols – but it seems likely that greater redundancy is needed to support demand in regions where ‘flows’ may be less efficient.   Compare the average library holdings per title for collections held outside of US mega-regions (about 14) to the ratio of holdings per title within mega-regions, which ranges from 2 to 9, for the Phoenix metro area and Chi-Pitts respectively.

Of course, there are other factors at play:  as a recent New York Times article showed, the geographic distribution of major research universities (and the libraries that serve them) is uneven – and regions with fewer research libraries will have a smaller concentration of rare or unique materials, compared to regions with many research intensive universities.  Not surprisingly, the BosWash region, which encompasses a substantial part of the ARL membership, has a relatively low level of redundancy in print book holdings (about 7 holdings per title) simply because the area is home to many institutions with large collections of rare or distinctive materials.

ARL Membership Map (2013)

Conversely, in areas with a high density of public libraries — which generally hold large collections of popular classics and best-selling titles — we see higher levels of overall redundancy in collections.  So one cannot infer that low levels of redundancy in library holdings across any region, whether organized against the mega-regions framework or anything else, are a reliable indicator of strong or weak flows.  Other regional factors, like the distribution of research universities and public libraries are clearly important.  It is interesting then to consider how the flow of library resources across regions may contribute to the organization of the library system as a whole.   Regional infrastructure will affect flows — but flows will also shape infrastructure.  Think, for example, of how the growth of rapid transit networks has transformed urban landscapes, encouraging the emergence of the sprawling metro areas that anchor mega-regions.

Lorcan Dempsey sometimes speaks of ‘library logistics’ and it is in this context that I have been thinking about the flow of library resources and more especially about the emergence of new hubs around which the library system is now being reconfigured.  Thom Hickey’s recent experiments in programmatically identifying concentrations of material related to a particular topic or identity — what we’ve referred to as ‘centers data’ – provide a new way to think about how the library system is organized and how it is changing.  It’s not clear if the existing centers reflect intentionally cultivated strengths in institutional holdings, or if they are merely accidents of history – an unsolicited donation of materials about a particular person, for instance.

In some cases, the association with known institutional centers of excellence seems evident:  it is not surprising, for example, to find that the University of Pittsburgh has the largest collection of material by or about Gonzalo Rojas, a celebrated Chilean poet.  Pitt has been a National Resource Center on Latin America for decades and it stands to reason they hold significant collections of Latin American literature.  Would an expert in the field have predicted that Pitt, rather than the University of Texas (a distinguished center of Latin American studies), has the most comprehensive collection related to Rojas?  Perhaps – I don’t have the domain knowledge to have an informed opinion about the likely location of the most comprehensive Latin American poetry collections in North America. Significantly, though, Pitt ranks within the top collections (by size) of works by related poets:

Gonzalo Rojas and related identities - Pitt holdings ranked against other WorldCat libraries

Gonzalo Rojas and related identities – University of Pittsburgh holdings ranked against other WorldCat libraries

As the figures here suggest, it is possible for a library to be a leading — or even the top-ranked–  ‘center’ of resources related to particular identity or topic without holding a vast number of titles.  This will obviously be true when the relevant oeuvre is limited:  to hold 100% of a small published record (a handful of titles, let’s say) is still significant.  What is more interesting is that the diffusion of library resources — the ‘scarcity’ that we find in institutional and within regional collections — effectively lowers the threshold for what constitutes excellence in institutional holdings.  In the example of Gonzalo Rojas, for example, Pitt’s 48 titles amount to less than 40% of the published works associated with the related VIAF heading.  Even so, this small collection is 75% larger (and 16% more comprehensive) than the related holdings at the Biblioteca Nacional de Chile — at least as they are reflected in WorldCat.  This is, at least to me, somewhat surprising.

This raises the question of how centers or hubs are revealed in the information environment.  Effective disclosure of collections that are distinctive not because of their rarity but because of their ‘excellence’ or completeness will be important if libraries are to be recognized as preferred hubs in the larger supply chain, where commercial providers are still dominant.  Ideally, one would like to have relevant library suppliers revealed in the network at the point of need, in the flow of the researcher’s work – which is increasingly likely to be outside the library discovery environment.  How could this be done?  Where do library fulfillment options fit in the knowledge graph? Somewhere in the Wikipedia infobox?   In Google’s info cards?  Happily, greater minds than mine are working on this problem.  Of one thing, at least, I’m certain:  understanding and representing the relationships between identities and topics in institutional and in regional collections – understanding how different ‘centers’ are related – will lead to new insights about how the library system is, and will be, organized.

Subsidence and uplift – the library landscape

Thursday, April 18th, 2013 by Constance

Approximate location of maximum subsidence in the United States.

There’s been a lot of attention to geologic subsidence of late, what with all the sinkholes opening up in Florida, Louisiana and other places. Here in California, we are more often concerned with the gradual change in ground level due to the draining of aquifers that support large-scale farming.  From year to year, the difference in ground level may be nearly imperceptible but over the space of a few decades the landscape has been radically transformed.

The subsidence metaphor was on my mind recently, as I was looking over some data compiled by my colleague Thom Hickey, documenting the usage of headings (subjects and names) in WorldCat. OCLC Research has done quite a lot of work exploring new approaches to managing subject and name authorities, notably in VIAF and FAST. I was interested to see how Thom’s data might be used to measure change — uplift and subsidence — in the library landscape. By computing the frequency with which FAST and VIAF headings occur in institutional collections cataloged in WorldCat, one can identify which libraries hold the most materials related to particular topics, places and people.  And this in turn provides a measure of the relative distinctiveness of library collections, judged not in terms of the ‘rarity’ of holdings but rather by the concentration of related content.

It  seemed to me that Thom’s data might have something interesting to say about how the emergence of large-scale digitized book aggregations – HathiTrust, Google Books, etc — is altering the library environment.  It stands to reason that as these large hubs begin to consolidate content sourced from libraries (and, in Google’s case, publishers), they will displace traditional library ‘centers of excellence’ in some subject areas.  Those who remember the DLF Aquifer project will recall that the initial prototype was designed to pool digitized resources in a given subject area (initially American History, later narrowed to Abraham Lincoln and the US Civil War).  In the very large aggregations of HathiTrust and GoogleBooks, subject specialization has emerged more gradually.  There has not been much public attention to measuring the scope of subject-based collections within those aggregations, nor to benchmarking them against existing institutional holdings.*

The FAST and VIAF centers data provide evidence of both subsidence and uplift in the current collections environment — that is, shifts in centers of excellence as measured by scope of subject based holdings.  The ‘re-leveling’ that has been wrought in just a few years of large-scale digitization is already significant.  Digital aggregations have, by design or accident, emerged as important subject repositories that rival and even outrank some of the largest institutional libraries in WorldCat.

For instance,  HathiTrust, an organization not yet five years old, already holds the greatest concentration of titles on the topic of marine biology, surpassing the Library of Congress as well as two major research universities with world-class oceanography programs.

Marine biology

In the case of Marine biology, the difference between the number of titles held by HathiTrust and the Library of Congress is not very large — fewer than 200 titles.  But in other instances, the relative subsidence of traditional centers of excellence is more dramatic.  For instance, Google Books substantially outranks several major research libraries in holdings related to Russian periodicals (journals, newspapers and the like).

Russian periodicals

This represents an important change in the library system, with monumental old hubs being progressively overshadowed by new collections that are produced not by the slow accretion of library acquisitions but by large-scale digitization and (re)aggregation.  It provides a compelling illustration of how Web-scale content aggregations are altering the library operating environment.  In the case of HathiTrust especially, this disruption can (and I think should) be seen as a positive change:  it enables libraries to rethink traditional, institution-scale collection management and stewardship — a topic we examined in our Cloud-sourcing Research Collections report some years ago.

Using Thom’s ‘centers’ data, we can identify hundreds of topics and identities for which HathiTrust offers better coverage than any other library in WorldCat.  Here a few topics in which the Digital Library distinguishes itself:

Hathi top topics

And a few of the personal names for which its coverage is unrivaled:

Hathi top names

Interestingly, the other top-ranked collections (by size) for these same subjects and identities are not always the source of HathiTrust’s richness.  One might have anticipated that Hathi’s leadership was simply a by-product of aggregating content from existing centers of excellence, but in fact Hathi has developed unexpected strengths by aggregating at a very large scale from a diverse pool of contributors.  For example, Harvard University and the University of Michigan each hold sizable collections of works by the poet Jean Ingelow; yet, the richness of Hathi’s Ingelow collection is mostly due to contributions from campus libraries in the University of California system.

The FAST and VIAF ‘centers’ data provide a fascinating new vantage point on the changing collections landscape.  We’ll be looking at ways to integrate it into ongoing research projects, including the mega-regions work, where we hope it can help us detect regional collecting trends that might inform shared stewardship priorities.

*Note:  HathiTrust provides a nice visualizations and a list of subject areas in the Digital Library, based on Library of Congress classification numbers.  These provide a good overview of subject-based coverage but without reference to comparable coverage in other libraries. It is generally known that Google is selective with respect to identifying library partners, but I’m not aware of any public documentation related to a specific collection development strategy. Their aim, famously, is to provide comprehensive coverage of the world’s books, not to develop excellence in any given subject area.

Regional print management and cooperative infrastructure: maps and gaps

Monday, March 4th, 2013 by Constance


We are excited to be working with the Ohio State University (OSU) and the Committee on Institutional Cooperation (CIC) on a new project to explore the contours of a regional strategy for managing the print book resource in the CHI-PITTS mega-region. Regular readers of this blog will know that mega-regions are geographic areas that typically encompass multiple population centers, exhibit a high degree of economic integration, and are bound together by a rich network of transportation, logistics, and communications infrastructure, as well as mutual cultural interests and similarities. Mega-regions are an intriguing concept for thinking about collaborative activities that scale above small groups of institutions, or even existing library consortia. OCLC Research recently published a report that used a mega-regions framework to explore the characteristics and implications of a North American network of regionally consolidated print book collections.

Over the last few months, we have explored this issue further by working with several US regional library consortia to examine their collective print book holdings in the context of the print book resource and infrastructure available in the mega-region most closely aligned with the location of the consortial membership. We have produced profiles for the Statewide California Electronic Library Consortium (SCELC) in the context of the SO-CAL mega-region; the Association of Southeastern Research Libraries (ASERL) and the Washington Research Library Consortium (WRLC) in the context of the CHAR-LANTA mega-region; and the National Institute for Technology in Liberal Education (NITLE) membership in the context of the BOS-WASH mega-region. We plan to publish a series of case studies highlighting the findings from these consortial profiles in the near future.

Our new collaboration with OSU and the CIC is an extension of this consortial profiling work. In this project, we will examine print book holdings at multiple levels: an institution (OSU); a library consortium (CIC); and a mega-region (CHI-PITTS). The purpose of the work is to conduct a detailed analysis of the factors that an individual library might bring to bear in selecting books to contribute to a shared consortial collection, as well as to compare both the individual library collection and the consortial print book resource to the broader context of the print book resource available in the surrounding mega-region. The CHI-PITTS mega-region, which extends across the upper Midwest from Chicago to Pittsburgh, is the mega-region which aligns most closely with the locations of the CIC membership.

Some of the questions we will address include:

  • What part of the OSU print book collection represents a distinctive asset when compared to the aggregate print book holdings within the CIC membership, or the broader CHI-PITTS mega-regional print book resource? What are the characteristics of these distinctive resources with respect to subject, age, and system-wide work-level holdings?
  • What part of the OSU collection is widely held across the collections of the CIC membership, or institutions within the CHI-PITTS region? Can a “core” set of titles be identified, at the consortial or regional level, that represent duplicative investment? Are there opportunities to reduce local costs by managing these titles as a shared resource at the consortial or regional level?
  • What does the ILL demand profile for OSU tell us about consortial and regional demand for its print book collection? How much of this demand is centered around OSU’s distinctive print book titles? How can OSU cooperate with other CIC members to meet local, consortial, and regional demand for print books?

Carol Pitts Diedrichs, Director of OSU Libraries, has posted a nice summary of the thinking that led up to this joint effort.

OSU volunteered to serve as a test case for this project, with the understanding that findings from the analysis will be useful to all CIC member libraries considering shared print archiving arrangements. Of course, we hope the project will be useful to other libraries as well. There is growing interest in how (or if) the lessons learned in journal archiving projects like the Western Regional Storage Trust (WEST) or the CIC Shared Print Repository can be applied to cooperative efforts to preserve monographic collections. This project should provide some answers. We expect to post periodic updates on the project over the next several months here on Hanging Together, and will publish a synthesis of findings in a final report later this year.


“Cataloging Unchained”

Wednesday, February 27th, 2013 by Roy

Lorcan Dempsey (VP of Research at OCLC) has long said that we need to “make our data work harder.” And for years that is exactly what OCLC Research has been doing. So when I was asked to speak on data mining at the OCLC European, Middle East, and African Regional Council Meeting in Strasbourg, France, I knew I would have a lot to talk about. Too much, in fact.

Instead of trying to cover everything we’ve been doing in a whirlwind of slides that no one would remember, I decided to use WorldCat Identities as a “poster child” for the kinds of data mining activities we have been doing recently here at OCLC Research. Then, I described another, related project — the Virtual International Authority File. To bring it all home I mentioned how we’re considering how we might be able to marry these two resources into one “super” identities service.

Consider what it would mean to take an aggregation of library-curated authority records and enhance it with algorithmically-derived data from WorldCat as well as links to other resources about creators such as Wikipedia. This would provide a rich resource of information about creators, all sitting behind authoritative and maintained identifiers that could be used in emerging new bibliographic structures such as is being created by the Library of Congress’ Bibliographic Framework Transition Initiative. The mind reels with the possibilities.

But before I could jump into all this I needed a way to quickly explain why we are doing things like this — and how we are doing them. I decided I needed to make a video. So last week that is exactly what I did, with help from colleagues in Dublin. The result was less than three-and-a-half minutes long, and yet it amply set the stage for what was to come after. Plus, it can have a life of its own.

Take a look yourself, at “Cataloging Unchained”, and let me know what you think in the comments.

Managing print books: A mega-problem?

Wednesday, December 12th, 2012 by Constance

This research note was co-authored by Brian Lavoie  and Constance Malpas.

Opportunity cost seems to be the watchword for print book collections these days. The staff, physical space, and other resources consumed by print-centric collections and services are badly needed to support new priorities in library services, such as deeper user engagement and closer alignment with changing research and learning practices. In the face of evidence of declining print book usage, combined with an ever-expanding array of digital alternatives, it is not difficult to imagine a future where “bookless” libraries are the norm.

But this may be premature. Few libraries are prepared to pack up their print books and send them to off-site high-density storage. On several highly-publicized occasions, plans to reduce local print book inventory have met vigorous opposition – witness the recent firestorm at the New York Public Library. In short, print collections pose a dilemma for libraries: they are assets too valuable to dispose of, yet sinking in priority vis-à-vis other aspects of the library service portfolio. The phrase “managing down print”, increasingly common in print management discussions, neatly captures the dueling imperatives: the need to allocate resources away from managing print book collections, but to do so in a gradual, orderly way. So the search is on for the golden mean: a viable print management strategy that can at once leverage more value out of the legacy print investment, and lower maintenance costs. This question is far from settled, but the contours of the solution are becoming apparent. First, future print management strategies are likely to be collaborative, with print books increasingly viewed as a shared asset to be managed cooperatively. Second, the scale of cooperation receiving the most attention, in terms of both planned and implemented solutions, is at the regional level.

This is not to suggest that the rest is a mere matter of detail: for example, the policy and technical infrastructures needed to support a regional strategy for cooperative print management are still in early stages of development. In the meantime, we can speculate on what a network of cooperatively-managed regional print book collections might look like. The OCLC Research report Print Management at “Mega-scale”: A Regional Perspective on Print Book Collections in North America explores a new geography of print book collections based on the concept of mega-regions. Mega-regions are geographical areas defined on the basis of economic integration and other forms of interdependence. The mega-regions framework has the benefit of basing regional boundaries on a substantive underpinning of shared traditions, mutual interests, and the needs of a common constituency.

In the report, we combine WorldCat data with an operationalization of the mega-region concept by urbanist Richard Florida to produce a network of twelve mega-regional print book collections – i.e., the collective print book holdings of all libraries in each region – corresponding to the twelve North American mega-regions identified by Florida (see figure below; click on image to view full size). We explore the salient characteristics of the mega-regional collections individually and as a group, and synthesize these characteristics into a set of stylized facts. The stylized facts are then used to explore the implications of a regionally-based, cooperative print strategy across a wide spectrum of issues, including access, management, and preservation.

(Click on image to view full-size version.)

Viewing print book collections as a cooperatively-managed regional resource yields benefits on both the supply-side and the demand side. On the demand side, aggregating the print holdings of many institutions into a single collective collection creates a resource of greater scope and depth than any single local collection. Exposing this collective collection to users around the region – or even beyond – may amplify or even create demand for print books that experience little or no local use. On the supply-side, regional coordination could streamline print management and reduce costs. Opportunities emerge for collaboration and coordination in collecting and retention decisions – for example, by diminishing excessive duplication and sharing collecting priorities across many institutions.

While our application of the mega-regions framework to print management is speculative, evidence does suggest that the organization of library stewardship is being reconfigured on a new supra-institutional, regional basis. The Western Regional Storage Trust, a cooperative effort to archive print journals in libraries in many Western (and even Midwestern) US libraries, is one among many examples.  Some of these initiatives, like the CIC Shared Print Archive or the ASERL Print Journal Archive, have the potential – if not the explicit intent – to deliver benefit at mega-regional scale:  CIC member libraries are distributed across the expansive CHI-PITTS  region and ASERL’s membership is concentrated in CHAR-LANTA.  It will be interesting to see if these natural experiments in redistributing print preservation responsibilities across broad geographies result in a richer collective resource, undergirded by a robust federation of preservation commitments, or a differently fragmented set of regional collections.

In the coming year, we’ll have an opportunity to extend our mega-regions analysis by taking a demand-side view of the North American print book collection. We’ll be working with partner libraries in the CIC (notably the Ohio State University) to examine how inter-lending data might be combined with supply-side holdings data to inform a regional print management strategy for retrospective monographic collections in CHI-PITTS. Here’s a thumbnail sketch of the regional resource, excerpted from our project proposal:

In aggregate, the print book resource held in CHI-PITTS libraries amounts to more than 40% of print book titles in North America. About 16% of these titles are unique to the region, i.e. not duplicated in any of the other eleven mega region collections. The remainder constitutes a significant preservation “backstop” for other North American libraries: 50-92% of titles held by other individual mega-regions are duplicated in CHI-PITTS libraries. Thus, investments in the preservation of print books in the CHI-PITTS region can deliver significant benefit to libraries throughout North America. Conversely, there are relatively few regional collections that duplicate a significant share of the CHI-PITTS collection, which means that the burden of print preservation responsibilities (and investments) will be largely shouldered by institutions within the region. Since less than a fifth of the print books in the region are held by academic research libraries – traditionally viewed as the institutions with the greatest stake in print preservation – it seems apparent that networks like the CIC will have an important role to play in rationalizing regional print preservation priorities and investment.

The CIC is an interesting test case for this sort of project, since all libraries in the consortium are partners in the HathiTrust Digital Library, a shared digital repository. By our reckoning, a third or more of the titles held by CIC member libraries are already “backed up” by digital preservation copies in HathiTrust.  Yet from a regional perspective, the situation is strikingly different:  we estimate that less than a fifth of the print books in CHI-PITTS are duplicated by HathiTrust. The collective preservation burden therefore remains significant even in a region with comparatively robust cooperative library infrastructure.

In regions where shared library infrastructure is less developed or less integrated, the challenges may be even greater.  Take Southern California, for example.  We estimate that the regional print book resource in the SO-CAL mega-region amounts to just under 10 million titles with about 40 million library holdings (i.e. holdings set by libraries in the region).  While much smaller in size than the CHI-PITTS collection, the SO-CAL collection represents an important regional asset and a significant stewardship concern for academic libraries in the area.  As elsewhere, these libraries are individually and collectively reassessing the opportunity costs of managing local print inventory and considering “above the institution” solutions.  Not surprisingly, smaller academic libraries look to larger research-intensive institutions as partners in the preservation enterprise and potential providers of shared infrastructure.

The University of California system, with five large research libraries and a high-density storage facility in the SO-CAL region, is an obvious focus of attention. But the infrastructure developed to support a statewide research university system with a global brand cannot simply be extended to serve all other libraries in the region. There is no shared governance model for the regional library resource, which is distributed across hundreds of public and private institutions. And there is no business model currently in place that would enable libraries to opt in to “preservation by proxy” arrangements. Yet, progress is being made. A group of library leaders from academic libraries and consortia in and around Southern California will meet later this week to begin what is certain to be a long conversation about a regional print management strategy. Bob Kieft, a long-time supporter (and sometime agitator) for collaborative collection management, has organized the meeting, which will be hosted by UCLA. It’s impossible to predict what the outcomes of the discussion might be – there is certainly no recipe for success in regional print management – but it is unquestionably an important first step in addressing what is increasingly a “mega” problem.


21st-Century Research Library Collections

Wednesday, May 30th, 2012 by Jim

I was fortunate to attend the final session of the recent Association of Research Libraries membership meeting on this topic. The panel of presentations served as the occasion for the release of a new briefing paper for research library directors, “21st-Century Collections: Calibration of Investment and Collaborative Action.”

The paper is the work of the ARL 21st-Century Research Library Collections Task Force, co-chaired by Deborah Jakubs at Duke University, and Tom Leonard at the University of California Berkeley. It’s worth your attention. The paper (pdf) focuses on the collaborative future of collections and looks at the future contours from the perspective of scholars/researchers, content, publishing and infrastructure. It’s very short and intended to be evocative and provocative as opposed to providing a blueprint or schematic on how to assemble the future it outlines. This caused some discussion during the meeting – should ARL be distilling the sense of the community and presenting it back or should it be an organization that organizes its members around an action plan to assemble the future? Given the diversity of the membership and the varying aspirations and resources of the institutions it’s hard to imagine that grand plan execution should be an ARL aspiration. That seemed to be the sense of those still assembled.

The closest to an action schematic in the discussion was the presentation by Wendy Pradt Lougee at the University of Minnesota, who said content is still a core role but the context for investments is changing as are the strategies which will require coordination and collaboration on a new scale. Her presentation titled Content & Collections:Rubrics and Rubiks is a must-read.

Her new rubric presents what I think is the correct formulation of the change that’s imperative and it highlights the inherent problem in arriving at that new equilibrium state. Essentially that state requires us to solve an equation whose left side is the newly optimized local circumstance (priorities, infrastructure, uniqueness) and whose right side is a set of shared supra-institutional factors (goals, priorities, infrastructure and services). The problem with the equation is that it has no constants. For local optimization to occur it needs to be aware of and rely on the supra-institutional factors – those are not yet in place and their characteristics not yet codified in a way that allows local choices to be made and operational processes to be confidently altered..

It seems to me that the challenge to supra-institutional providers of infrastructure and services is to define some of that future infrastructure and service provision in concert with the collective goals and priorities of those they intend to serve. Once defined those providers need to declare their intention to build, offer and sustain those services so that local institutional decisions can be definitively made. In the US there are quite a few actual and aspiring providers of shared infrastructure and services. OCLC is certainly one of the largest and is a pervasive provider along some important dimensions of the library service portfolio. Our challenge is to listen to the emerging desires of our members for a different class of shared services and then exercise leadership that commits to the provision of that infrastructure and those services where we have a unique capacity. This is the kind of change that would provide our members with the constants that let them optimally solve the local equation.

Marking Progress: print archives disclosure

Friday, May 25th, 2012 by Constance

For the past year and a half, Dennis and I have been working closely with a group of Research Library Partners and others to develop and test a method for registering print archives in WorldCat.  I’m pleased to say that the OCLC Print Archives Disclosure Pilot is now complete and a final report of our findings has been published. The report was jointly authored by Lizanne Payne (project director of the Western Regional Storage Trust), Emily Stambaugh (manager of  the California Digital Library’s Shared Print program), along with Dennis and myself.  Partners in this project included the Center for Research Libraries, (CRL), the California Digital Library (CDL), and the libraries of Indiana University; Stanford University; the University of California, Los Angeles; the University of California, San Diego; the University of Minnesota and the University of Oregon.

The report has actually been out for a few weeks now; it was published without fuss or fanfare at the end of April. Gary Price was kind enough to feature it in an InfoDocket post last month, and it’s been making the rounds on some of the specialized discussion lists devoted to print archiving and preservation activities.  The specifics of the report — guidance on how and where to register print preservation commitments — apply to a relatively small number of institutions, but the publication itself marks a milestone for library community as a whole.  It represents the culmination of several related efforts directed at redesigning the critical (and costly) business of  preserving print books and journals.

It’s been a long road.  Back in 2009, an OCLC Research working group undertook a review of shared print policy documents that revealed some significant gaps in existing guidance, particularly with respect to how and where print archiving commitments should be expressed or registered:

About half of the policies [examined in the report] stipulate that the special retention and/or shared access status of documents covered by the agreement should be systematically registered; less than 20% specify a location in the MARC21 bibliographic or local holdings record where this information is to be recorded. Only a quarter of the policies reviewed mandate disclosure of the retention or shared access status in regional, national or international union lists.

This last finding has important implications for collection-sharing efforts that seek to achieve significant scale or impact on system-wide economies. More effective and systematic disclosure of retention commitments, in particular, might produce significant network effects by enabling anonymous participation in collection-sharing initiatives, generating secondary benefits for the entire library community.

Predictably, the report closed with a set of recommendations (or admonitions) intended to address the policy gaps that we felt were most important:

Cooperative agreements that are intended to achieve or to enable truly transformative change in the way library print collections are managed should include:

  • A business model that acknowledges the changing value of library print resources in the current information environment;
  • An explicit acknowledgment that effective disclosure of library holdings and retention commitments is necessary to support distributed management of print archives; and
  • A commitment to capture, retain and share item-level condition information so that the preservation quality of print archives may be better judged.

The working group that contributed to the policy review was disbanded in 2009, but several participants continued to work, more or less informally, on drafting a set of guidelines for print archives disclosure in WorldCat.  That effort was explicitly modeled on modeled on practices developed in the 1990s for recording preservation microfilming information.  At the time, NEH was funding a large-scale brittle books preservation program and, to reduce duplicative effort, participating libraries needed a mechanism for identifying the titles and volumes that were already queued for filming.  Nancy Elkington was a prime mover in developing standard practices for recording this information in bibliographic union catalogs, using the MARC 583 Preservation Action Note.

Along with Deb McKern, a preservation officer at the Library of Congress, Nancy encouraged us to extend use of the 583 Action Note to print archiving activities.  Since 2005, use of the 583 had already been extended to registration of digital archives in the Registry of Digital Masters, a joint effort of the Digital Library Federation and OCLC.  It seemed sensible to build upon this past work in developing guidelines for registering print archiving commitments.  However, our initial effort to define guidelines for print archives disclosure foundered when it became clear that the bibliographic record was not an appropriate vehicle for recording item-level condition or retention statements.  For journal archiving efforts in particular, it was difficult to convey in a title-level record how much of a given journal run was actually preserved.  And, in a master-record union catalog like WorldCat, it was even harder to see how archiving commitments from multiple institutions could be adequately represented.

For a year or more, our efforts to define descriptive metadata guidelines for print archiving lay fallow.  Other projects were taken up.  But by 2010, with the emergence of several large-scale print journal archiving efforts and increasing public awareness of the importance of distributed preservation, it was clear that common approach to identifying shared print collections was urgently needed.  As anticipated in our 2009 report, the largest archiving efforts were finding it impossible to “scale up” without some shared infrastructure.  Happily, in the intervening years, support for item-level holdings information in WorldCat had increased substantially and it was possible to design and test a disclosure strategy that was better adapted to journals.  With the support of OCLC product management, the Print Archives Disclosure Pilot project was launched.  And as a result we are now — collectively — in a better place to design and implement scalable strategies for print preservation.

The staffing challenge – it’s not just new skill sets

Tuesday, November 22nd, 2011 by Jim

At the last Association of Research Libraries (ARL) membership meeting (which means I’ve been carrying this thought around since mid-October, shame on me) I sat in on the Transforming Research Libraries Committee meeting chaired by my long-time friend, Carton Rogers. He precipitated what turned out to be a very engaged and frank discussion about the staffing challenges research libraries face while trying to navigate the increasingly urgent imperatives to transform their operations and renovate their portfolio of services.

He got things started by sharing an excerpt from an survey of ARL directors that asked what were the top three areas ARL should emphasize on behalf of members over the next three years. The first choice of 41.7% of the respondents was

Workforce needs of 21st century research libraries, including new roles for professionals.

A few of the committee members spoke to that issue and by the end of the discussion nearly every one in the room had shared their local challenges in this regard. People talked about inter-generational work issues, succession problems, and the obstacles to change posed by both union circumstances and librarians with faculty status. They wanted to have librarians engaged more directly in the research process and the outputs of their institutions but recognized that most of the current work force was not sufficiently versed in the research process or in the deliverables from that process to become effective support or service agents. All of these are real world management challenges that even with the best of will on the part of both management and staff seemed increasingly intractable.

In a follow-on conversation to the meeting I was reminded of a very good piece of research done by a group of ARL Research Library Leadership Fellows a few years ago that spoke directly to this problem. Krisellen Maloney, Kristin Antelman, Kenning Arlitsch and John Butler did a project whose results were first presented at the October 2008 ARL Membership meeting in a session titled “What are our future leaders thinking?” They later published the work in CR&L in a piece titled “Future Leaders’ Views on Organizational Culture“.

Their project surveyed 165 future leaders (see the article for their working definition) to assess whether there was a relationship between future library leaders’ satisfaction with their organizational cultures and their perception of their own effectiveness. As you might imagine the academic library profile is dominated by a Hierarchy culture and the preferred culture,as perceived by this population, is more flexible and externally oriented. The staff who are most likely to contribute to the reshaping and transforming of the academic library service portfolio feel thwarted and judge their efforts ineffective. The discussion and conclusions in the published article are worth your attention.

What library administrators identify as a staffing challenge is cast, in the most measured way, by the authors as a call for organizational cultural change. Moving from hierarchy to adhocracy – a culture of high flexibility and external focus – would liberate the motivated staff resources we already have and create an environment congenial to the people with the skill sets that the future library needs. Without minimizing the individual local management challenge it seems to me that we would do well to put in place an explicit program aimed at cultural change at the same time that we look to renew, realign and refresh library staff skill sets.

P.S. It was at their ARL presentation that I was first introduced to Prezi, the presentation software that is the anti-powerpoint, which has become my default environment for my presentations.