As part of our joint research project with the CIC Center for Library Initiatives and the OSU Library, we’re examining inter-lending flows within and outside of the 13-member CIC consortium. We are using a subset of the OCLC WorldCat Resource Sharing (WCRS) transaction data archive for this analysis. Our current data-set comprises 1.33 million request transactions, representing nearly 900,0000 individual titles loaned by CIC libraries over the past several years.
As Max Klein has noted elsewhere in this blog, the OCLC Research group is starting to experiment with new approaches to data visualization using the R statistical modeling environment and D3 JavaScript code library. (The WorldCat Live prototype is a nice example of how these are being put to use.) We are eager to integrate some of this experimental work into the ongoing CIC analysis.
Bruce Washburn, Brian Lavoie and I met recently to look at examples in Mike Bostock’s D3 gallery, which provides some great examples of how different visualization techniques can be implemented. We were looking for examples that were fit to purpose for this particular project. Since inter-lending is a library-centric approach to balancing supply and demand, Brian suggested that we focus on examples that are particularly expressive for modeling import/export flows. We settled on force-directed graphs and Sankey flow diagrams as good candidates for further exploration. And because we are especially interested in understanding the flow of library resources across geographies, we decided it was worth some additional work to enhance our WCRS transaction dataset with geo-codes, so that we can experiment with mapping flows across regions.
From his experiments with TopicWeb, Bruce has developed some facility with D3 and he is now doing some work with R. But before we run head-long into any new development work, I wanted to do some low-level experimentation to see if the data we have in hand, and the questions we are trying to answer, lend themselves to visualization in Sankey diagrams.
When Brian, Bruce and I met to discuss models, we spent a fair bit of time discussing this diagram of horse import/export activity in Europe. It wasn’t until this week that I realized it had been produced by the prolific blogger and Open Data advocate Tony Hirst (aka @psychemedia, whom I’ve followed on Twitter for a long while) as an experiment in formatting data for Sankey diagrams. His blog post on this topic is great — unfortunately, it’s way beyond my current skill to implement. But in the comments, I noted that Bruce Mcpherson had developed some VBA code that uses Excel as the data input to a D3 Sankey library. This was just what I needed for some quick experimentation with our current data set.
Here are a few illustrative screenshots:
The three-letter symbols correspond to OCLC institution symbols for the 29 CIC collections we are examining in this project.
Another, showing the breakdown of CIC borrowing of CIC returnables:
And a third, this time with some detail for both Non-CIC and CIC borrowers — NB the number of non-CIC borrowers makes it difficult to represent them all in this format, hence the block of ‘others.’
Now, these are admittedly primitive pictures of how resources flow out of CIC libraries and into other places — but they do capture some important attributes that we are interested to explore further. For instance, it is immediately apparent that there are some major ‘sources’ and ‘sinks’ for CIC returnables. And it’s clear that while the demand generated outside the CIC is significant (greater than the demand generated within the consortium), it is extremely diffuse — spread across a population of thousands of libraries. Both of these are important for understanding for how existing library flows can be optimized. As we refine our analysis, we’ll be examining what factors are driving demand to particular libraries: proximity of lender, scarcity of alternative supply options, price incentives, efficiency of service (as measured by turn-around) , etc. And we’ll be looking at new ways to use data visualization to explore — and share — interesting and important patterns in the organization of the library system.
Update: Thanks to Tony Hirst’s comment (below) and some subsequent Twitter exchanges with @timelyportfolio,
@psychemedia @ramnath_vaidya @ConstanceM sankey plugin is my next project similar to horizon
— klr (@timelyportfolio) July 13, 2013
there is now a very nice tutorial on creating Sankey diagrams using rCharts and d3. Many thanks to klr and Tony for taking the initiative.
Constance Malpas is Director of Strategic Programs at OCLC. She joined OCLC in 2006, first working with the Research Library Partnership and later as a Research Scientist and Strategic Intelligence Manager. Constance is the author/co-author of multiple OCLC Research publications on library collections and services, collaboration, and the evolving higher education landscape.
Hi:
I am looking to create a data visualization similar to the one you have at the top of this page (intra CIC flows). can you help me. I am not a technical person, but I can learn these things. I have my input in excel and it has two columns and about 280 rows. I am trying to map applications to functions. any help is appreciated.
Hi:
I was browsing the web looking for some way to create a visualization similar (in fact almost exactly the same kind) to the one you have (intra CIC flows) at the top of this page. I am not a technical person. Can you please help me how I can make one like that. My data is basically two columns source (functions) and target (applications). any help or advice will be greatly appreciated.
Tony,
I should have thought that the strong uptake for e.g. LibGuides would make that sort of link assessment generally interesting — I mean as a programmatic approach to integrating analytics into library support activities.
I scanned your blog (rather quickly I’ll admit) for a related post but didn’t see one. Could you point me in the right direction?
Any chance you have looked at half-life of OU course modules (if that is the term)? I have been wondering recently about what the OU has learned about re-usability of course materials based on the rate at which they have been recycled…presumably it varies a bit based on discipline (maths units will have greater longevity?). I guessed that OU would have enough historical data to take up an examination…or perhaps it was done long ago.
Constance
Constance
Some time ago I dabbled with a tool that started to explore traffic from OU course pages to library pages they linked to, my thinking being:
1) folk who wrote courses might be interested in which library links were followed
2) library folk may be interested to see what courses/link strategies sent traffic to Library pages.
No-one grokked why it might be interesting though, so I gave up on pursuing it further.
Hi Constance,
Interesting stuff – I don’t think I’ve seen the diagrams used for this before?
re: the pragmatics of generating Sankey diagrams from R, there is a new R library called rCharts (http://rcharts.io/ ) that makes it easy to use a variety of d3js powered javascript libraries to generate interactive charts from R. I’m not sure whether anyone has wrapped/demoed the creation of a Sankey diagram from R yet using this approach, but if it would be useful I could maybe have a go at working out how to do such a thing?
Tony,
Thanks for the pointer to rCharts and especially for mobilizing interested colleagues to help experiment. I’ve updated the post with a link to @timelyportfolio‘s new tutorial.
More generally, we are interested in exploring new approaches for modeling flows across the library system. The CIC project provides an opportunity to do some of that, albeit on a relatively limited scale. I know a variety of people are interested in applying network analysis to the study of library collection and space usage — if you come across any examples, I’d love to see them. If I recall correctly UIUC — for ex. — did a study of flows of users between campus (departmental) libraries and is doing some study of circ data. Would be nice to see the usage analysis rolled up to a higher level. For our purposes, ILL data is a way into understanding the larger system dynamics.
Constance