Recently, we looked at how Sankey diagrams might be used to visualize the flow of library resources within and across inter-lending networks. It was a useful exercise, but it left me feeling that a critical dimension was lacking: a measure of the geographic distance between inter-lending partners. Understanding that a significant share of the inter-lending demand that is fulfilled by CIC libraries is generated outside the CIC group is significant in its own right, but it doesn’t tell us much about the relative costs of serving ‘in-group’ and ‘out-of-group’ partners. If most of the non-CIC borrowers are located within close proximity of CIC lenders, the costs of fulfilling returnable requests (which must travel to and from the borrower) will be less than if the non-CIC lenders are located farther away. I tried mapping the Sankey flows to ZIP codes, but unless one is already familiar with the codes, it is fairly difficult to visualize the distance covered.
The obvious solution was to plot the outbound and returning flows on a map. At first, I did this by dropping markers for the borrowing and lending partners on a map, using a simple Web application (BatchGeo) that uses the Google Maps API to generate map-based data visualizations. BatchGeo is a nice tool, but in this case the result wasn’t very pleasing — the density of same-sized location markers in some locations made it difficult to read and, more importantly, obscured patterns in the relative concentration and diffusion in different regions. This was particularly true in comparatively small states. It was a very noisy picture. Even if one looks at a small fraction of the inter-lending partners, the result is an irritating blur. Limiting the inter-lending population to the top 5% of borrowers resulted in this not especially informative picture:
Highlighting the borrowers located in the ChiPitts megaregion made it only slightly more interesting:
By chance, as I was experimenting with these maps, Jim Michalko (my boss) stopped by my desk to chat about a recent article in the New York Times on the geography of economic mobility. He half-jokingly suggested that we overlay a map of North American library infrastructure on the map of economic mobility, to see if there was any correlation between the availability of library services and the likelihood that individuals can better their economic lot in life. Well, why not? I already had a geo-coded set of US libraries — all I needed to do was map those to pre-existing shape files to produce a county-level view of library infrastructure. I used a freely available ZIP code data table to map ZIP data to county-level boundaries. The mappings are not perfect, but I considered them good enough for my purpose — which was not to produce an exact map of all library locations, but simply to compare the relative density of regional library infrastructure. With this in hand, I could use a method outlined by Robert Mundigl in his Clearly and Simply blog to associate data values with colored fill gradients in choropleth maps, using Excel.
Here is the result:
It is not a complete map — I wasn’t able to map every OCLC library symbol in the United States to a valid ZIP-based county, and not every library in the US has an OCLC symbol. Still, with nearly 30 thousand libraries, it is more comprehensive than a map produced earlier this year based on IMLS data for about 17 thousand public libraries.
The first thing to be said about this map is that it does not suggest that there is any obvious correlation between the concentration of library resource (infrastructure) and economic mobility. Several of the areas that authors of the Equality of Opportunity study highlight as places where children of low-income families have a relatively greater likelihood of rising in the income distribution have comparatively limited library infrastructure. Admittedly, the geographic unit of measure in the two maps differs — I used counties (partly because they are readily available as shape files), while the researchers used commuting zones. It’s not obvious to me that if the library data were aligned to commuting zones, the picture would look much different: our data suggests that there is comparatively little library infrastructure in the upper Northeast zone of Nebraska, whether one relies on county or commuting zone boundaries — yet, this is an area where inter-generational income gains are relatively frequent. Conversely, in metro areas like Chicago where library infrastructure is comparatively dense, there is reportedly a pretty low level of inter-generational income gain.
Of course, to judge the strength of the US library system based on the geographic distribution of libraries alone is to overlook a vital — perhaps the most vital — attribute of the library enterprise. Libraries are in the business of increasing access to information by sharing resources that are distributed across broad networks of related institutions: public libraries, academic libraries, special libraries etc. Libraries are part of what is now fashionably termed the ‘sharing economy.’ To measure the integrity or vitality of the library system, one needs to take into account the efficiency of flows across the library network. A successful library system is one that ensures that a child (or adult) in rural Nebraska has access to the same collections and services as a child (or adult) in Minneapolis or Seattle.
It would be interesting to investigate whether geographic areas that favor economic mobility are co-extensive with areas where library flows — the balance of supply and demand for library resources — are notably efficient. Perhaps some LIS PhD student will take up the challenge. My current objective is a lot more prosaic: modeling supply and demand within and outside of a given library consortium to inform decisions about local and shared stewardship of print collections. For this, I think the county-level choropleth is actually quite useful. It helps to show how demand is distributed at ‘above-the-institution’ scale, and this is important for understanding the role of logistics in optimizing the flow of library resources.
This is what a county-level heatmap of demand for CIC returnables looks like, reflecting cumulative inter-lending request activity over a period of about seven years:
What does this map tell us? A few important things are immediately discernible:
- CIC libraries (which are mostly located in the Midwest) serve institutions across an enormous geographic range in the United States.
- Regional demand is concentrated in a relatively small number of counties.
- The relative volume of demand is quite low; counties with the greatest number of request transactions individually account for less than 7% of total demand.
Some of this information was equally visible in the institution location map that I had started with — but the county-based version is less noisy and enables me to roll up a great deal more data (about 1.3 million transactions and five thousand borrowers) in a single picture. It also raises some additional questions:
- Are the ‘hotspots’ of demand an artifact of aggregating demand over several years? I.e. did all of the demand from Southern California occur in the last twelve months or is it a recurring pattern over many years?
- Has the geographic range of demand changed over time? Are CIC libraries a broader range of institutional partners today than they did five years ago?
- Do individual CIC member libraries serve a comparable range of institutions? Is ‘long-range lending’ associated with libraries that hold materials that are relatively scarce in the overall system, or are materials traveling farther than is necessary to meet demand?
It was easy enough to plot annual demand for an individual lender, so I produced a new series of maps looking at the county-level location of institutions who borrowed returnable items from the Ohio State University Library (symbol OSU) over a few years. This time, I used the absolute number of loans as the input, so that even low-volume borrowers would be visible. Here are the results for:
At first glance, they might appear to be the same map…yet there are minor variations from year to year. The consistency in regions of demand is interesting, since it suggests that there is some predictability in the sources of demand — year after year, institutions in Southern California (mostly Los Angeles County) and Southern Arizona (mostly Pima County) have turned to OSU as a supplier one hundred times or more. Why does this matter? A pattern of sustained demand might suggest that a subscription based pricing model would benefit partners on both sides, providing predictability in budgeting, and also provide OSU with documentation of the continuing value its holdings are producing for other institutions (and geographies).
The variations in demand are equally interesting. For example, demand for OSU resources from institutions in the Pacific Northwest seems to have waned somewhat — could this be a result of improved intra-regional inter-lending arrangements and courier service within the Orbis-Cascade group? Less visible in these pictures, but no less intriguing (I think) is the decreasing range of geographies served by OSU between 2008 and 2012, which amounts to a reduction of 12%. This trend holds up over a longer period, not reflected in the maps above. What might account for this change? Is demand being deflected to other suppliers? Or is the increasing volume of requests generated by ‘in-network’ CIC partners displacing fulfillment for non-CIC institutions?
There’s a lot more to explore in these complex inter-lending networks — and I suspect that visualizing flows between institutions and across geographies will become increasingly important in monitoring, analyzing and improving efficiency in the library system as a whole. OCLC makes an increasingly wide range of data (including ILL policy and institution data) available for programmatic use by developers and others, and this will hopefully lead to more experimentation with visualization and library analytics.