As part of our joint research project with the CIC Center for Library Initiatives and the OSU Library, we’re examining inter-lending flows within and outside of the 13-member CIC consortium. We are using a subset of the OCLC WorldCat Resource Sharing (WCRS) transaction data archive for this analysis. Our current data-set comprises 1.33 million request transactions, representing nearly 900,0000 individual titles loaned by CIC libraries over the past several years.
Bruce Washburn, Brian Lavoie and I met recently to look at examples in Mike Bostock’s D3 gallery, which provides some great examples of how different visualization techniques can be implemented. We were looking for examples that were fit to purpose for this particular project. Since inter-lending is a library-centric approach to balancing supply and demand, Brian suggested that we focus on examples that are particularly expressive for modeling import/export flows. We settled on force-directed graphs and Sankey flow diagrams as good candidates for further exploration. And because we are especially interested in understanding the flow of library resources across geographies, we decided it was worth some additional work to enhance our WCRS transaction dataset with geo-codes, so that we can experiment with mapping flows across regions.
From his experiments with TopicWeb, Bruce has developed some facility with D3 and he is now doing some work with R. But before we run head-long into any new development work, I wanted to do some low-level experimentation to see if the data we have in hand, and the questions we are trying to answer, lend themselves to visualization in Sankey diagrams.
When Brian, Bruce and I met to discuss models, we spent a fair bit of time discussing this diagram of horse import/export activity in Europe. It wasn’t until this week that I realized it had been produced by the prolific blogger and Open Data advocate Tony Hirst (aka @psychemedia, whom I’ve followed on Twitter for a long while) as an experiment in formatting data for Sankey diagrams. His blog post on this topic is great — unfortunately, it’s way beyond my current skill to implement. But in the comments, I noted that Bruce Mcpherson had developed some VBA code that uses Excel as the data input to a D3 Sankey library. This was just what I needed for some quick experimentation with our current data set.
Here are a few illustrative screenshots:
The three-letter symbols correspond to OCLC institution symbols for the 29 CIC collections we are examining in this project.
Another, showing the breakdown of CIC borrowing of CIC returnables:
And a third, this time with some detail for both Non-CIC and CIC borrowers — NB the number of non-CIC borrowers makes it difficult to represent them all in this format, hence the block of ‘others.’
Now, these are admittedly primitive pictures of how resources flow out of CIC libraries and into other places — but they do capture some important attributes that we are interested to explore further. For instance, it is immediately apparent that there are some major ‘sources’ and ‘sinks’ for CIC returnables. And it’s clear that while the demand generated outside the CIC is significant (greater than the demand generated within the consortium), it is extremely diffuse — spread across a population of thousands of libraries. Both of these are important for understanding for how existing library flows can be optimized. As we refine our analysis, we’ll be examining what factors are driving demand to particular libraries: proximity of lender, scarcity of alternative supply options, price incentives, efficiency of service (as measured by turn-around) , etc. And we’ll be looking at new ways to use data visualization to explore — and share — interesting and important patterns in the organization of the library system.
Update: Thanks to Tony Hirst’s comment (below) and some subsequent Twitter exchanges with @timelyportfolio,
— klr (@timelyportfolio) July 13, 2013