Recently, we looked at how Sankey diagrams might be used to visualize the flow of library resources within and across inter-lending networks. It was a useful exercise, but it left me feeling that a critical dimension was lacking: a measure of the geographic distance between inter-lending partners. Understanding that a significant share of the inter-lending demand that is fulfilled by CIC libraries is generated outside the CIC group is significant in its own right, but it doesn’t tell us much about the relative costs of serving ‘in-group’ and ‘out-of-group’ partners. If most of the non-CIC borrowers are located within close proximity of CIC lenders, the costs of fulfilling returnable requests (which must travel to and from the borrower) will be less than if the non-CIC lenders are located farther away. I tried mapping the Sankey flows to ZIP codes, but unless one is already familiar with the codes, it is fairly difficult to visualize the distance covered.
The obvious solution was to plot the outbound and returning flows on a map. At first, I did this by dropping markers for the borrowing and lending partners on a map, using a simple Web application (BatchGeo) that uses the Google Maps API to generate map-based data visualizations. BatchGeo is a nice tool, but in this case the result wasn’t very pleasing — the density of same-sized location markers in some locations made it difficult to read and, more importantly, obscured patterns in the relative concentration and diffusion in different regions. This was particularly true in comparatively small states. It was a very noisy picture. Even if one looks at a small fraction of the inter-lending partners, the result is an irritating blur. Limiting the inter-lending population to the top 5% of borrowers resulted in this not especially informative picture:
Highlighting the borrowers located in the ChiPitts megaregion made it only slightly more interesting:
By chance, as I was experimenting with these maps, Jim Michalko (my boss) stopped by my desk to chat about a recent article in the New York Times on the geography of economic mobility. He half-jokingly suggested that we overlay a map of North American library infrastructure on the map of economic mobility, to see if there was any correlation between the availability of library services and the likelihood that individuals can better their economic lot in life. Well, why not? I already had a geo-coded set of US libraries — all I needed to do was map those to pre-existing shape files to produce a county-level view of library infrastructure. I used a freely available ZIP code data table to map ZIP data to county-level boundaries. The mappings are not perfect, but I considered them good enough for my purpose — which was not to produce an exact map of all library locations, but simply to compare the relative density of regional library infrastructure. With this in hand, I could use a method outlined by Robert Mundigl in his Clearly and Simply blog to associate data values with colored fill gradients in choropleth maps, using Excel.
Here is the result:
It is not a complete map — I wasn’t able to map every OCLC library symbol in the United States to a valid ZIP-based county, and not every library in the US has an OCLC symbol. Still, with nearly 30 thousand libraries, it is more comprehensive than a map produced earlier this year based on IMLS data for about 17 thousand public libraries.
The first thing to be said about this map is that it does not suggest that there is any obvious correlation between the concentration of library resource (infrastructure) and economic mobility. Several of the areas that authors of the Equality of Opportunity study highlight as places where children of low-income families have a relatively greater likelihood of rising in the income distribution have comparatively limited library infrastructure. Admittedly, the geographic unit of measure in the two maps differs — I used counties (partly because they are readily available as shape files), while the researchers used commuting zones. It’s not obvious to me that if the library data were aligned to commuting zones, the picture would look much different: our data suggests that there is comparatively little library infrastructure in the upper Northeast zone of Nebraska, whether one relies on county or commuting zone boundaries — yet, this is an area where inter-generational income gains are relatively frequent. Conversely, in metro areas like Chicago where library infrastructure is comparatively dense, there is reportedly a pretty low level of inter-generational income gain.
Of course, to judge the strength of the US library system based on the geographic distribution of libraries alone is to overlook a vital — perhaps the most vital — attribute of the library enterprise. Libraries are in the business of increasing access to information by sharing resources that are distributed across broad networks of related institutions: public libraries, academic libraries, special libraries etc. Libraries are part of what is now fashionably termed the ‘sharing economy.’ To measure the integrity or vitality of the library system, one needs to take into account the efficiency of flows across the library network. A successful library system is one that ensures that a child (or adult) in rural Nebraska has access to the same collections and services as a child (or adult) in Minneapolis or Seattle.
It would be interesting to investigate whether geographic areas that favor economic mobility are co-extensive with areas where library flows — the balance of supply and demand for library resources — are notably efficient. Perhaps some LIS PhD student will take up the challenge. My current objective is a lot more prosaic: modeling supply and demand within and outside of a given library consortium to inform decisions about local and shared stewardship of print collections. For this, I think the county-level choropleth is actually quite useful. It helps to show how demand is distributed at ‘above-the-institution’ scale, and this is important for understanding the role of logistics in optimizing the flow of library resources.
This is what a county-level heatmap of demand for CIC returnables looks like, reflecting cumulative inter-lending request activity over a period of about seven years:
What does this map tell us? A few important things are immediately discernible:
- CIC libraries (which are mostly located in the Midwest) serve institutions across an enormous geographic range in the United States.
- Regional demand is concentrated in a relatively small number of counties.
- The relative volume of demand is quite low; counties with the greatest number of request transactions individually account for less than 7% of total demand.
Some of this information was equally visible in the institution location map that I had started with — but the county-based version is less noisy and enables me to roll up a great deal more data (about 1.3 million transactions and five thousand borrowers) in a single picture. It also raises some additional questions:
- Are the ‘hotspots’ of demand an artifact of aggregating demand over several years? I.e. did all of the demand from Southern California occur in the last twelve months or is it a recurring pattern over many years?
- Has the geographic range of demand changed over time? Are CIC libraries a broader range of institutional partners today than they did five years ago?
- Do individual CIC member libraries serve a comparable range of institutions? Is ‘long-range lending’ associated with libraries that hold materials that are relatively scarce in the overall system, or are materials traveling farther than is necessary to meet demand?
It was easy enough to plot annual demand for an individual lender, so I produced a new series of maps looking at the county-level location of institutions who borrowed returnable items from the Ohio State University Library (symbol OSU) over a few years. This time, I used the absolute number of loans as the input, so that even low-volume borrowers would be visible. Here are the results for:
2008…
2010…
and 2012…
At first glance, they might appear to be the same map…yet there are minor variations from year to year. The consistency in regions of demand is interesting, since it suggests that there is some predictability in the sources of demand — year after year, institutions in Southern California (mostly Los Angeles County) and Southern Arizona (mostly Pima County) have turned to OSU as a supplier one hundred times or more. Why does this matter? A pattern of sustained demand might suggest that a subscription based pricing model would benefit partners on both sides, providing predictability in budgeting, and also provide OSU with documentation of the continuing value its holdings are producing for other institutions (and geographies).
The variations in demand are equally interesting. For example, demand for OSU resources from institutions in the Pacific Northwest seems to have waned somewhat — could this be a result of improved intra-regional inter-lending arrangements and courier service within the Orbis-Cascade group? Less visible in these pictures, but no less intriguing (I think) is the decreasing range of geographies served by OSU between 2008 and 2012, which amounts to a reduction of 12%. This trend holds up over a longer period, not reflected in the maps above. What might account for this change? Is demand being deflected to other suppliers? Or is the increasing volume of requests generated by ‘in-network’ CIC partners displacing fulfillment for non-CIC institutions?
There’s a lot more to explore in these complex inter-lending networks — and I suspect that visualizing flows between institutions and across geographies will become increasingly important in monitoring, analyzing and improving efficiency in the library system as a whole. OCLC makes an increasingly wide range of data (including ILL policy and institution data) available for programmatic use by developers and others, and this will hopefully lead to more experimentation with visualization and library analytics.
Constance Malpas is Director of Strategic Programs at OCLC. She joined OCLC in 2006, first working with the Research Library Partnership and later as a Research Scientist and Strategic Intelligence Manager. Constance is the author/co-author of multiple OCLC Research publications on library collections and services, collaboration, and the evolving higher education landscape.
I live in Tucson AZ. What is most interesting to me is that your map(s) are not considering population. Pima county (the extremely RED rhomboid like shape in the lower left of AZ) has almost no population that lives outside of the cities. Literally there are enormous land tracts (thousands of square miles in AZ) with basically NO people… Indian reservations, BLM land, Federal land, Wilderness… (yes we actually have wilderness…) etc… Now if you imagine a fairly small sliver on the right side of that red shape I described above then that is where the people actually are. The rest is sand, scrub brush, sand, an occasional mountain range, sand, and maybe a small tree here and there.. oh yes I forgot, lots of cactus… and still more sand. Yes there are tumbleweeds but we should not really count them cause they are not suppose to be here. (Tumbleweeds are a thistle which came to USA from overseas in flour and spread during the dust bowl… unfortunately they like sand… oh well). Now back to the population thing.. I have seen this issue so many times on so many maps since “map making people” must not ever visit the desert (that is a guess, i’m not sure just a hunch)… Pima county gets credit all the time for stuff I don’t think they ever did and then gets blamed other times for stuff I know they didn’t do. However, now Pima County does have a very good GIS map system (pimamaps.gov should find it) And I know that data has been shared many ways for things like real estate, transportation etc… so maybe a USA network of GIS systems would be a good thing? And recent accurate population data does exist and or can be extracted easily. So here’s the point… If you live in Ajo (pronounced Ah-Hoe) AZ then you would have to drive 134 miles to Tucson (of course there is plenty of sand to look at on the way so you can’t get bored…) If you made that drive every day you would be stupid. So nobody I know does that. Instead the few hundred people in Ajo stay there and work in the mines (hopefully that works well for them). But they would also know if they look at one of these maps that nobody in Ajo has ever gotten a CIC loan because when you live in a town of a few hundred people you know everybody. So the map is obviously misleading making it look like this big area is a hot spot when in fact it is really mostly sand (which I already covered and don’t really want to beat that to death… I’m just reminding you… case you forgot). So the real hot spot is the TINY SLIVER which now in your mind should be an unbelieve-able hot spot. I guess what Im saying is that you will need to increase the number of colors on the map once the population data shows where the loans are really being made and where the sand is… Most of the south west has this issue. By the way… If you know anybody who needs some sand… I could really help them out! Good Luck…
Thanks, Kevin. If you have any suggestions for how we could do more with mapping, I’d be glad to hear them. I think an interactive visualization similar to Mike Bostock’s airport hubs and arcs graph (http://mbostock.github.io/d3/talk/20111116/airports.html) would be ideal for showing interlending flows from multiple hubs. I’d like to find time to experiment with that.
Nice post