Concentration, Diffusion, Centers & Flows

Our 2012 mega-regions analysis revealed a notable feature of the library landscape in North America:  the apparent  scarcity of print inventory varies significantly depending upon the scale at which it is assessed.  It stands to reason, of course, that a title held by a single library in one locale may be held by many libraries in other places.  What is more surprising is that scarcity – or what is more appropriately termed diffusion – is a characteristic that persists even at the scale of the mega-region.  In every one of the 12 regions we examined, more than 75% of the print book titles are held by five or fewer libraries.  Yet, comparing one mega-region to another, we found a high level of bilateral duplication:  for example, more than 70% of the print book publications held in Cascadia – a mega-region encompassing urban centers in Oregon, Washington, and British Columbia – are duplicated by library holdings in the NorCal region.

Bi-lateral duplication of print books in Cascadia and NorCal
Bi-lateral duplication of print books in Cascadia and NorCal

Now it may be the high degree of integration that is a primary characteristic of mega-regions is also a factor in the diffusion of library resources within those same regions.  Arguably, the strong networks of exchange and robust logistics infrastructure of mega-regions help to explain why we find such a low level of redundancy in library collections within those regions.  Even without coordinated collection development plans, it may be that the ease with which resources (including library books) flow within mega-regions exercises some influence on library acquisitions.  The incentive to acquire ‘just in case’ inventory will be less in a region where inter-lending networks are strong and it relatively easy to obtain copies from neighboring institutions, and this confidence in regional supply options may have a sort of invisible-hand effect that constrains redundant acquisitions.

This would suggest that the high degree of diffusion we see in regional collections – inventory distributed across a number of geographically distant institutions – is a characteristic of a ‘well-organized’ (though not deliberately engineered) library system.  By contrast, library collections in institutions located outside of mega-regions tends to exhibit a higher degree of redundancy.  This is partly a reflection of the large number of libraries that fall outside of the defined mega-regions – more than nine thousand individual OCLC institution symbols – but it seems likely that greater redundancy is needed to support demand in regions where ‘flows’ may be less efficient.   Compare the average library holdings per title for collections held outside of US mega-regions (about 14) to the ratio of holdings per title within mega-regions, which ranges from 2 to 9, for the Phoenix metro area and Chi-Pitts respectively.

Of course, there are other factors at play:  as a recent New York Times article showed, the geographic distribution of major research universities (and the libraries that serve them) is uneven – and regions with fewer research libraries will have a smaller concentration of rare or unique materials, compared to regions with many research intensive universities.  Not surprisingly, the BosWash region, which encompasses a substantial part of the ARL membership, has a relatively low level of redundancy in print book holdings (about 7 holdings per title) simply because the area is home to many institutions with large collections of rare or distinctive materials.

ARL Membership Map (2013)

Conversely, in areas with a high density of public libraries — which generally hold large collections of popular classics and best-selling titles — we see higher levels of overall redundancy in collections.  So one cannot infer that low levels of redundancy in library holdings across any region, whether organized against the mega-regions framework or anything else, are a reliable indicator of strong or weak flows.  Other regional factors, like the distribution of research universities and public libraries are clearly important.  It is interesting then to consider how the flow of library resources across regions may contribute to the organization of the library system as a whole.   Regional infrastructure will affect flows — but flows will also shape infrastructure.  Think, for example, of how the growth of rapid transit networks has transformed urban landscapes, encouraging the emergence of the sprawling metro areas that anchor mega-regions.

Lorcan Dempsey sometimes speaks of ‘library logistics’ and it is in this context that I have been thinking about the flow of library resources and more especially about the emergence of new hubs around which the library system is now being reconfigured.  Thom Hickey’s recent experiments in programmatically identifying concentrations of material related to a particular topic or identity — what we’ve referred to as ‘centers data’ – provide a new way to think about how the library system is organized and how it is changing.  It’s not clear if the existing centers reflect intentionally cultivated strengths in institutional holdings, or if they are merely accidents of history – an unsolicited donation of materials about a particular person, for instance.

In some cases, the association with known institutional centers of excellence seems evident:  it is not surprising, for example, to find that the University of Pittsburgh has the largest collection of material by or about Gonzalo Rojas, a celebrated Chilean poet.  Pitt has been a National Resource Center on Latin America for decades and it stands to reason they hold significant collections of Latin American literature.  Would an expert in the field have predicted that Pitt, rather than the University of Texas (a distinguished center of Latin American studies), has the most comprehensive collection related to Rojas?  Perhaps – I don’t have the domain knowledge to have an informed opinion about the likely location of the most comprehensive Latin American poetry collections in North America. Significantly, though, Pitt ranks within the top collections (by size) of works by related poets:

Gonzalo Rojas and related identities - Pitt holdings ranked against other WorldCat libraries
Gonzalo Rojas and related identities – University of Pittsburgh holdings ranked against other WorldCat libraries

As the figures here suggest, it is possible for a library to be a leading — or even the top-ranked–  ‘center’ of resources related to particular identity or topic without holding a vast number of titles.  This will obviously be true when the relevant oeuvre is limited:  to hold 100% of a small published record (a handful of titles, let’s say) is still significant.  What is more interesting is that the diffusion of library resources — the ‘scarcity’ that we find in institutional and within regional collections — effectively lowers the threshold for what constitutes excellence in institutional holdings.  In the example of Gonzalo Rojas, for example, Pitt’s 48 titles amount to less than 40% of the published works associated with the related VIAF heading.  Even so, this small collection is 75% larger (and 16% more comprehensive) than the related holdings at the Biblioteca Nacional de Chile — at least as they are reflected in WorldCat.  This is, at least to me, somewhat surprising.

This raises the question of how centers or hubs are revealed in the information environment.  Effective disclosure of collections that are distinctive not because of their rarity but because of their ‘excellence’ or completeness will be important if libraries are to be recognized as preferred hubs in the larger supply chain, where commercial providers are still dominant.  Ideally, one would like to have relevant library suppliers revealed in the network at the point of need, in the flow of the researcher’s work – which is increasingly likely to be outside the library discovery environment.  How could this be done?  Where do library fulfillment options fit in the knowledge graph? Somewhere in the Wikipedia infobox?   In Google’s info cards?  Happily, greater minds than mine are working on this problem.  Of one thing, at least, I’m certain:  understanding and representing the relationships between identities and topics in institutional and in regional collections – understanding how different ‘centers’ are related – will lead to new insights about how the library system is, and will be, organized.