Archive for the 'Research Note' Category

Researcher Identifiers

Monday, April 14th, 2014 by Karen

Researcher IDsResearcher identifiers have been the focus of OCLC Research “Registering Researchers in Authority Files” Task Group since October 2012 (see my blog post soon after we started). We recently issued our draft Registering Researchers in Authority Files report for community review and feedback.

One of the challenges in this work is that there are different stakeholders with an interest in researcher identifiers. The task group identified seven: researcher, funder, university administrator, journalist, librarian, identity management systems, and aggregators (which include publishers).  We developed use case scenarios, functional requirements and recommendations targeted at each of these stakeholders.

The task group members also represent different perspectives: ORCID and ISNI Board members, people representing the perspectives of publishers, a CRIS (Current Research Information System), VIAF, VIVO and librarians including those who create authority records and contribute them to the LC/NACO Authority File.  Task group members come from The Netherlands, the United Kingdom and the United States.

During the review period we hope to receive feedback from different stakeholders with different perspectives. We’ve already heard a basic question, from a researcher: “What is an authority file?” Book authors may not even realize they have an “LCCN” identifier from an authority record a librarian created as part of cataloging their works, nor that they also have a VIAF identifier as a result.  The report’s introduction attempts to differentiate authority records and identifiers. Task group member Micah Altman (MIT), in the presentation he and I did for the spring Coalition for Networked Information membership meeting (“Integrating Researcher Identifiers into University and Library Systems”) created the following table comparing the two:

Research Identifiers Compared to Name Authorities

 Please send your comments on our draft report to me at smithyok@oclc.org.  We plan to publish the final report in June, so feedback received by 30 April would be most timely.

Authority for Establishing Metadata Practice

Monday, April 7th, 2014 by Karen


A metadata fkiw duagram
That was the topic discussed recently by OCLC Research Library Partners metadata managers. Carlen Ruschoff (U. Maryland), Philip Schreur (Stanford) and Joan Swanekamp (Yale) had initiated the topic, observing that libraries are taking responsibility for more and more types of metadata (descriptive, preservation, technical, etc.) and its representation in various formats (MARC, MODS, RDF). Responsibility for establishing metadata practice can be spread across different divisions in the library. Practices developed in relative isolation may have some unforeseen outcomes for discovery in awkward juxtapositions.

The discussion revolved around these themes:

Various kinds of splits create varying metadata needs. Splits identified included digital library vs. traditional; MARC vs. non-MARC; projects vs. ongoing operations. Joan Swanekamp noted that many of Yale’s early digitization projects involved special collections which started with their own metadata schemes geared towards specific audiences. But the metadata doesn’t merge well with the rest of the library’s metadata, and it’s been a huge amount of work to try to coordinate these different needs. There is a common belief in controlled vocabularies even when the purposes are different.  The granularity of different digital projects makes it difficult to normalize the metadata. Coordination issues include using data element in different ways, not using some basic elements, and lack of context. Repository managers try to mandate as little as possible to minimize the barriers to contributions. As a result, there’s a lot of user-generated metadata that would be difficult to integrate with catalog data.

Metadata requirements vary due to different systems, metadata standards, communities’ needs. Some digital assets are described using MODS (Metadata Object Description Schema) or VRA. Graphic arts departments need to find images based on subject headings, which may result in what seems to be redundant data. There’s some tension between specific area and general needs. Curators for specific communities such as music and divinity have a deeper sense of what their respective communities need rather than what’s needed in a centralized database. Subject headings that rely on keyword or locally devised schemes can clash with the LC subject headings used centrally.  These differences and inconsistencies have become more visible as libraries have implemented discovery layers that retrieve metadata from across all their resources.

Some sort of “metadata coordination group” is common.  Some libraries have created metadata coordination units (under various names), or are planning to. Such oversight teams provide a clearing house to talk about depth, quality and coverage of metadata. An alternative approach is to “embed” metadata specialists in other units that create metadata such as digital library projects, serving as consultants. After UCLA worked on ten different digital projects, it developed a checklist that could be used across projects: Guidelines for Descriptive Metadata for the UCLA Digital Library Program (2012). It takes time to understand different perspectives of metadata: what is important and relevant to curators’ respective professional standards.  It’s important to start the discussions about expectations and requirements at the beginning of a project.

We can leverage identifiers to link names across metadata silos. As names are a key element regardless of which metadata schema is used, we discussed the possibility of using one or more identifier systems to link them together. Some institutions encourage their researchers to use the Elsevier expert system. Some are experimenting with or considering using identifiers such as ORCID (Open Researcher and Contributor ID), ISNI (International Standard Name Identifier) or VIAF (Virtual International Authority File). VIAF is receiving an increasing number of LC/NACO Authority File records that include other identifiers in the 024 field.

Implications of BIBFRAME Authorities

Thursday, April 3rd, 2014 by Karen

 

Bibframe graphicThat was the topic discussed recently by OCLC Research Library Partners metadata managers, initiated by Philip Schreur of Stanford. We were fortunate that several staff from the Library of Congress involved with the Bibliographic Framework Initiative (aka BIBFRAME) participated.

Excerpts from On BIBFRAME Authority  dated 15 August 2013 served as background, specifically the sections on the “lightweight abstraction layer” (2.1- 2.3) and the “direct” approach (3). During the discussion, Kevin Ford of LC shared the link to the relatively recent BIBFRAME Authorities draft specification dated 7 March 2014, now out for public review: http://www.loc.gov/bibframe/docs/bibframe-authorities.html

The discussion revolved around these themes:

The role of identifiers for names vis-à-vis authority records. Ray Denenberg of LC noted that when the initiative first began, the framers searched unsuccessfully for an alternate name for “authorities” as it could be confused with replicating the LC/NACO or local authority files that follow a certain set of cataloging rules and are constantly updated and maintained. BIBFRAME is meant to operate in a linked data environment, giving everyone a lot of flexibility. The “BIBFRAME Authority” is defined as a class that can be used across files. It could be simply an identifier to an authoritative source, and people could link to multiple sources as needed. The identifier link could also be used to grab more information from the “real” authority record.

Concern about sharing authority work done in local “light abstraction layers.” It was posited that Program for Cooperative Cataloging libraries, and others, could share local authorities work and expose it as linked data. This is one of the objectives for the Stanford-Cornell-Harvard Linked Data for Libraries experiment. They plan to use a type of shared light abstraction model, where they may share URIs for names rather than each institution creating their own. Concerns remain about accessing, indexing and displaying shared local authorities across multiple institutions, and the risk of outages that could hamper access. Although libraries could develop a pared down approach to creating local authority data (which may not be much more than an identifier) and then have programs that pull in more information from other sources, some feared that data would only be created locally and not shared and libraries would not ingest richer data available from elsewhere.

Alternate approaches to authority work. Given the limited staff libraries have, fewer have the resources to contribute to the LC/NACO authority file as much as they have in the past. The lightweight model could serve as a place for identifiers and labels, and allow libraries to quickly create identifiers for local researchers prominent in the journal literature but not reflected in national authority files. Using identifiers instead of worrying about validating content—doing something quick locally that you can’t afford to do at a national level—is appealing. Alternatively, a library could bring in information from multiple authority sources—each serving a different community—noting the equivalents and providing an appropriate label.  BIBFRAME Authority supports both approaches. Other sources could include those favored by publishers rather than libraries, such as ORCID (Open Researcher and Contributor ID) or ISNI (International Standard Name Identifier), or by other communities such as those using EAC-CPF (Encoded Archival Context – Corporate bodies, Persons and Families). This interest overlaps the OCLC Research activity on Registering Researchers in Authority Files.

Concern about the future role of the LC/NACO Authority File.  Some are concerned that if libraries chose to rely on identifiers to register their scholars or bring in information from other sources, fewer would contribute to the LC/NACO Authority File. Will we lose the great database catalogers have built cooperatively over the past few decades? Some would still prefer to have one place for all authority data and do all their authority work there. LC staff noted that a program could be run to ingest authority data done in these local (or consortial) abstraction layers into the LC/NACO Authority File.

Issues around ingesting authority data. We already have the technology to implement Web “triggers” to launch programs that pull in information from targeted sources and write the information to our own databases. OCLC Research recently held a TAI-CHI webinar demonstrating xEAC and RAMP (Remixing Archival Metadata Project), two tools that do just that. There are other challenges such as evaluating the trustworthiness of the sources, selecting which ones are most appropriate for your own context and reconciling multiple identifiers representing the same entity. Some are looking for third-party reconciliation services that would include links to other identifiers.

Those interested in the continuing discussion of BIBFRAME may wish to subscribe to the BIBFRAME listserv.

 

 

New Scholars’ Contributions to VIAF: Syriac!

Tuesday, March 11th, 2014 by Karen
Syriac VIAF Example for Blog

Syriac scripts added to VIAF cluster for Ephraem

We have just loaded into the Virtual International Authority File (VIAF) the second set of personal names from a scholarly resource, the Syriac Reference Portal hosted by Vanderbilt University.

Syriac is a dialect of Aramaic, developed in the kingdom of Mesopotamia in the first century A.D. It flourished in the Persian and Roman Empires, and Syriac texts comprise the third largest surviving corpus of literature from the Roman Empire, after Greek and Latin. The Syriac Reference Portal is a collaborative digital reference project funded by the National Endowment for the Humanities and the Andrew W. Mellon Foundation involving partners at Vanderbilt University, Princeton University and other affiliate institutions. Syriac - Ephraem

This addition represents the first time we see Syriac scripts (there are variants) as both the “preferred form” and under “alternate name forms” in a VIAF record. The Syriac Reference Portal also contributes additional Arabic and other scripts as alternate names, but selects a Syriac script form as a preferred form for people who wrote or were written about in Syriac.

The Syriac names join the Roman and Greek personal names we loaded from the Perseus Catalog last November and blogged about here as part of our Scholars’ Contributions to VIAF activity. Together they demonstrate how scholarly contributions can enrich existing VIAF clusters—generally comprising contributions from national libraries and other library agencies— by adding script forms of names that previously lacked them, as well as adding new names. Scholars benefit from using VIAF URIs as persistent identifiers for the names in their own databases, linked data applications and scholarly discourse to disambiguate names in multinational collaborations and using VIAF as a means to disseminate scholarly research on names beyond scholars’ own communities.

Adding these scholarly files demonstrates the benefits of tapping scholarly expertise to enhance and add to name authorities represented in VIAF. We look forward to more such enhancements from other scholars’ contributions.

Metadata for digital objects

Tuesday, November 26th, 2013 by Karen

That was the topic discussed recently by OCLC Research Library Partners metadata managers. It was initiated by Jonathan LeBreton of Temple, who noted the questions staff raised when describing voluminous image collections such as: Do we share the metadata even if it would swamp results? What context can be provided economically? What are others doing both in terms of data schemas and where the metadata is shared?

The discussion revolved around these themes:

Challenges in addressing the sheer volume of digital materials.  Managers are making decisions based on staffing, subject expertise, collection’s importance and funding. It was suggested that some metadata could be extracted from the technical metadata, such as dates and location. We discussed the possibility of crowd-sourcing metadata creation, although experience to date is that a few volunteers are responsible for most contributions, and the successful examples tend to be for transcription, editing OCR’d text, and categorizations. (The At a Glance: Sites that Support Social Metadata chart indicates the ones that enhance data either through improved description or subject access.) The context must matter to people for them to volunteer their efforts. (See the OCLC Report, Social Metadata for Libraries, Archives and Museums: Executive Summary.) With the anticipated increase of born-digital and other digitized materials, there’s a greater need for batch and bulk processing.

Grappling with born-digital materials.  Libraries are receiving the digital equivalents of personal papers and using the Forensic Toolkit to “process” these digital collections.  Preservation and rights management, in addition to description, are important components and no commercially available system yet addresses these needs. The Association of Research Libraries is working with the Society of American Archivists to customize its Digital Archives Specialist (DAS) Program to develop the requisite skills for managing born-digital for ARL library staff. OCLC Research has produced several reports in conjunction with its Demystifying Born Digital program of work.

Concerns about “siloization”, or proliferation of “boutique” collections, using different metadata schema. Metadata is being created in different native systems within an institution, metadata that is often not loaded into a central catalog or even accessible in the local discovery layer. User-created metadata in institutional repositories may be OAI harvested by OCLC and thus may appear in WorldCat even if not visible in the institution’s local discovery tool. Managers grapple with whether to spend resources on updating such metadata before it is exposed for harvesting.  Another challenge is deciding what to include in which discovery layer, and what should be silo’d.  The numerous repositories within an institution can result in complex metadata flows for discovery, as illustrated by UC San Diego’s Prezi diagram. Some institutions map their various metadata schema to MODS (Metadata Object Description Schema), but all non-MARC metadata is converted to MARC when loaded into WorldCat.

What are the “essential elements” to provide access across collections? We posited that librarians have been discussing “core” or “essential” metadata elements for decades, starting with Dublin Core and the Program for Cooperative Cataloging’s “BIBCO Standard Record”. Librarians have been entering metadata for the system it was designed for, but then ultimately the data moves to another system later.  Library metadata is no longer confined to a single system: it may be exposed to search engines and viewed with lots of non-library metadata.

The Library of Congress’ Bibliographic Framework Initiative  portends a future where all metadata will be “non-MARC” and we will rely more on linked data URIs in place of metadata text strings.  How can we use the promise of that future to get to where we need to be?

First Scholars’ Contributions to VIAF: Greek!

Monday, November 25th, 2013 by Karen

Perseus logo in VIAF Cluster

Contributors to the Virtual International Authority File (VIAF) have generally been national libraries and other library agencies.  We have just loaded into VIAF the first set of personal names from a scholarly resource, the Perseus Catalog hosted by Tufts University, an OCLC Research Library Partner. The Perseus Catalog aims to provide access to at least one online edition of every major Latin and Greek author from antiquity to 600 CE. Adding the Greek, Arabic and other script forms of names in the Perseus Catalog enrich existing VIAF clusters that previously lacked them.

This addition represents a milestone in our Scholars’ Contributions to VIAF activity. We anticipate mutual benefits from our collaboration with scholars. Scholars benefit from using VIAF URIs as persistent identifiers for the names in their own databases, linked data applications and scholarly discourse to disambiguate names in multinational collaborations and using VIAF as a means to disseminate scholarly research on names beyond scholars’ own communities. Both scholarly societies and libraries benefit from enriching VIAF with name authority data which would not otherwise be contributed by national libraries.

As noted in an earlier blog post, Irreconcilable differences? Name authority control & humanities scholarship,  OCLC Research discovered key issues important to scholars that didn’t mesh well with library practices represented in name authority files due to differences in intended audiences, disciplinary norms and metadata needs. However, if scholars do use the Library of Congress’ Metadata Authority Description schema, or MADS, as the Perseus Catalog does, we can add their files to VIAF much more easily.

Adding these scholarly files can demonstrate the benefits of tapping scholarly expertise to enhance and add to name authorities represented in VIAF. We have already seen the number of “alternate name forms” associated with VIAF clusters that include the Perseus Catalog’s contributions increase, with scripts not yet represented. We look forward to more such enhancements from other scholars’ contributions.

 

WorldCat shows dispersal of global resources

Tuesday, November 19th, 2013 by Karen

Number of institutions with WorldCat holdings for Arabic-language resources

Number of institutions with WorldCat holdings for Arabic-language resources

A differentiating feature of WorldCat is that it includes more than two billion holdings of libraries from around the world. My colleague Roy Tennant recently generated statistics on the Arabic-language resources described in WorldCat records. I was struck by the dispersal of the holdings of those materials, as shown in the map above.

Big caveat: Many strong Arabic-language collections are under-represented or not represented at all in WorldCat. Even so, we can see at a glance that Arabic-language materials are collected by institutions in countries far away from the counties of origin, even where Arabic is not widely spoken. Scholarship is international. We could produce similar maps for other language materials.

My colleague Brian Lavoie’s report earlier this year, Not Scotch, but Rum: The Scope and Diffusion of the Scottish Presence in the Published Record, describes in detail his analysis of holdings for materials published in Scotland, by Scottish people, and about Scotland. It concludes, “Most holdings of materials in the Scottish national presence are by institutions outside Scotland, which reminds us that a national presence in the published record may be primarily manifested outside the home country’s borders.”  

Multilingual WorldCat represented by translations

Tuesday, November 12th, 2013 by Karen

Great works are translated—the cream of the world’s cultural and knowledge heritage is shared by being translated. And many of them are represented by bibliographic records in WorldCat.

A group of us working on Multilingual WorldCat projects have been focusing on datamining WorldCat for works and all translations associated with them, identifying the translator for each translation. We plan to generate “uniform title” and “expression” records (the translations) and contribute them to the Virtual International Authority File (VIAF).

We currently have roughly 15 million personal name “clusters” in VIAF, the 26 million personal name authority records contributed by 35 agencies that represented the same person. These are not just creators of works, but also people that have had works written about them and sometimes a translator.

My colleague Jenny Toves has identified about 1 million persons in WorldCat who are associated with bibliographic records in more than one language, or roughly 7% of the people represented in VIAF.  The breakdown:

  • 624K names are associated with titles in only 2 languages
  • 283K names are associated with titles in 3 to 9 languages
  • 7K names are associated with titles in 10 or more languages
VIAF breakdown by language

Persons with titles in multiple languages

My colleague JD Shipengrover created the accompanying graphic.

We expect to focus our analysis efforts on the “short head” of the names whose works have been translated the most, and rely on machine algorithms to handle the “long tail” of the names associated with titles in only two or three languages.

Does technical services still have a distinct role?

Monday, October 28th, 2013 by Karen

That was the topic discussed recently by OCLC Research Library Partners metadata managers from seven countries. It was initiated by Philip Schreur of Stanford (and recently Chair of the Program for Cooperative Cataloging), who noted that although “technical services” had traditionally been organized around the modules of a local system, changes in the library environment have resulted in some major restructuring. Libraries have increased their use of outsourcing and now batchload records from vendors or other sources, blurring the lines between library and IT, and vastly reducing the number of materials that need to be cataloged manually locally. This in turn has allowed staff to devote time to broader issues of discovery and data management, and make strategic alliances with new partners outside of technical services.  Meanwhile, “metadata creation” is needed for resources not always part of the local catalog, such as digital collections or materials in an Institutional Repository.

The discussion revolved around these themes:

More widespread use and need for metadata, far beyond the traditional “bibliographic” metadata created by technical services staff.  Metadata specialists (a new alternative for “catalogers”) now deal with metadata of all types, with decreased focus on print and more emphasis on digital.  Technical services staff aspire to provide intellectual access to all resources, beyond those represented in the local catalog. A common discovery tool has driven the movement to more active metadata integration from the beginning of projects to ensure that metadata is cohesive.

Changing service portfolios and workflows, with new or expanded expectations. Technical services staff have taken on tasks that used to be done elsewhere. Among these new tasks: authority control for the institutional repository; managing electronic resources and licenses; integration with special collections and archives; helping researchers organize their data; creating metadata for digital projects; producing reports, dataloading and installing system upgrades (which used to be done by systems staff). There is a challenge to balance the workload between the influx of electronic and digital resources with print backlogs. Sensitivity to “organizational culture” in different units is more important than organizational structure.

New collaborations within the institution and with other organizations. Technical services staff  increasingly work in cross-divisional teams, such as staff involved with digital projects, archives, data mining, IT and liaisons with faculty. Alison Felstead at Oxford referred to two new posts in the Oxford institutional repository who report to cataloging but are part of the systems staff.  Libraries would like to work more closely with publishers to load metadata for e-resources into commonly used tools.

Need for new skill sets.  Managers need to “build digital confidence” in their staff—provide training in what is required to adequately describe and provide access to digital and electronic resources, and allow periods for experimentation. There is competition to recruit computer-savvy staff with IT, where the pay scale is much higher.

Several noted the need to evolve beyond “boutique-y” collection development and the need for a “metadata shepherd”. (Stanford recently posted a position for a “Metadata Strategist”.)  In general, we are seeing an emerging trend towards more fluid structures that allows staff to adapt to new workflows rather than organized around traditional functions.

Sliding scale: mapping local, group and system-wide library infrastructure

Sunday, July 28th, 2013 by Constance

Recently, we looked at how Sankey diagrams might be used to visualize the flow of library resources within and across inter-lending networks. It was a useful exercise, but it left me feeling that a critical dimension was lacking: a measure of the geographic distance between inter-lending partners. Understanding that a significant share of the inter-lending demand that is fulfilled by CIC libraries is generated outside the CIC group is significant in its own right, but it doesn’t tell us much about the relative costs of serving ‘in-group’ and ‘out-of-group’ partners. If most of the non-CIC borrowers are located within close proximity of CIC lenders, the costs of fulfilling returnable requests (which must travel to and from the borrower) will be less than if the non-CIC lenders are located farther away. I tried mapping the Sankey flows to ZIP codes, but unless one is already familiar with the codes, it is fairly difficult to visualize the distance covered.

The obvious solution was to plot the outbound and returning flows on a map. At first, I did this by dropping markers for the borrowing and lending partners on a map, using a simple Web application (BatchGeo) that uses the Google Maps API to generate map-based data visualizations. BatchGeo is a nice tool, but in this case the result wasn’t very pleasing — the density of same-sized location markers in some locations made it difficult to read and, more importantly, obscured patterns in the relative concentration and diffusion in different regions. This was particularly true in comparatively small states. It was a very noisy picture.  Even if one looks at a small fraction of the inter-lending partners, the result is an irritating blur.  Limiting the inter-lending population to the top 5% of borrowers resulted in this not especially informative picture:

Top 250 CIC borrowers by location

Highlighting the borrowers located in the ChiPitts megaregion made it only slightly more interesting:

Top 250 within and outside ChiPitts

By chance, as I was experimenting with these maps, Jim Michalko (my boss) stopped by my desk to chat about a recent article in the New York Times on the geography of economic mobility. He half-jokingly suggested that we overlay a map of North American library infrastructure on the map of economic mobility, to see if there was any correlation between the availability of library services and the likelihood that individuals can better their economic lot in life. Well, why not? I already had a geo-coded set of US libraries — all I needed to do was map those to pre-existing shape files to produce a county-level view of library infrastructure. I used a freely available ZIP code data table to map ZIP data to county-level boundaries. The mappings are not perfect, but I considered them good enough for my purpose — which was not to produce an exact map of all library locations, but simply to compare the relative density of regional library infrastructure. With this in hand, I could use a method outlined by Robert Mundigl in his Clearly and Simply blog to associate data values with colored fill gradients in choropleth maps, using Excel.

Here is the result:

County-level distribution of WorldCat libraries in the United States

County-level distribution of WorldCat libraries in the United States

It is not a complete map — I wasn’t able to map every OCLC library symbol in the United States to a valid ZIP-based county, and not every library in the US has an OCLC symbol. Still, with nearly 30 thousand libraries, it is more comprehensive  than a map produced earlier this year based on IMLS data for about 17 thousand public libraries.

The first thing to be said about this map is that it does not suggest that there is any obvious correlation between the concentration of library resource (infrastructure) and economic mobility. Several of the areas that authors of the Equality of Opportunity study highlight as places where children of low-income families have a relatively greater likelihood of rising in the income distribution have comparatively limited library infrastructure. Admittedly, the geographic unit of measure in the two maps differs — I used counties (partly because they are readily available as shape files), while the researchers used commuting zones. It’s not obvious to me that if the library data were aligned to commuting zones, the picture would look much different: our data suggests that there is comparatively little library infrastructure in the upper Northeast zone of Nebraska, whether one relies on county or commuting zone boundaries — yet, this is an area where inter-generational income gains are relatively frequent. Conversely, in metro areas like Chicago where library infrastructure is comparatively dense, there is reportedly a pretty low level of inter-generational income gain.

Of course, to judge the strength of the US library system based on the geographic distribution of libraries alone is to overlook a vital — perhaps the most vital — attribute of the library enterprise. Libraries are in the business of increasing access to information by sharing resources that are distributed across broad networks of related institutions: public libraries, academic libraries, special libraries etc. Libraries are part of what is now fashionably termed the ‘sharing economy.’ To measure the integrity or vitality of the library system, one needs to take into account the efficiency of flows across the library network.  A successful library system is one that ensures that a child (or adult) in rural Nebraska has access to the same collections and services as a child (or adult) in Minneapolis or Seattle.

It would be interesting to investigate whether geographic areas that favor economic mobility are co-extensive with areas where library flows — the balance of supply and demand for library resources — are notably efficient. Perhaps some LIS PhD student will take up the challenge. My current objective is a lot more prosaic: modeling supply and demand within and outside of a given library consortium to inform decisions about local and shared stewardship of print collections. For this, I think the county-level choropleth is actually quite useful. It helps to show how demand is distributed at ‘above-the-institution’ scale, and this is important for understanding the role of logistics in optimizing the flow of library resources.

This is what a county-level heatmap of demand for CIC returnables looks like, reflecting cumulative inter-lending request activity over a period of about seven years:

Percent of CIC Returnable Borrowing by US Counties

What does this map tell us? A few important things are immediately discernible:

  • CIC libraries (which are mostly located in the Midwest) serve institutions across an enormous geographic range in the United States.
  • Regional demand is concentrated in a relatively small number of counties.
  • The relative volume of demand is quite low; counties with the greatest number of request transactions individually account for less than 7% of total demand.

Some of this information was equally visible in the institution location map that I had started with — but the county-based version is less noisy and enables me to roll up a great deal more data (about 1.3 million transactions and five thousand borrowers) in a single picture.  It also raises some additional questions:

  • Are the ‘hotspots’ of demand an artifact of aggregating demand over several years?  I.e. did all of the demand from Southern California occur in the last twelve months or is it a recurring pattern over many years?
  • Has the geographic range of demand changed over time?  Are CIC libraries a broader range of institutional partners today than they did five years ago?
  • Do individual CIC member libraries serve a comparable range of institutions?  Is ‘long-range lending’ associated with libraries that hold materials that are relatively scarce in the overall system, or are materials traveling farther than is necessary to meet demand?

It was easy enough to plot annual demand for an individual lender, so I produced a new series of maps looking at the county-level location of institutions who borrowed returnable items from the Ohio State University Library (symbol OSU) over a few years.  This time, I used the absolute number of loans as the input, so that even low-volume borrowers would be visible.  Here are the results for:

2008

Locations of borrowers from OSU in CY2008.

Locations of borrowers from OSU in CY2008.

2010

Locations of borrowers from OSU in CY2010.

and 2012

Locations of borrowers from OSU in CY2012.

Locations of borrowers from OSU in CY2012.

At first glance, they might appear to be the same map…yet there are minor variations from year to year. The consistency in regions of demand is interesting, since it suggests that there is some predictability in the sources of demand — year after year, institutions in Southern California (mostly Los Angeles County) and Southern Arizona (mostly Pima County) have turned to OSU as a supplier one hundred times or more.  Why does this matter?  A pattern of sustained demand might suggest that a subscription based pricing model would benefit partners on both sides, providing predictability in budgeting, and  also provide OSU with documentation of the continuing value its holdings are producing for other institutions (and geographies).

The variations in demand are equally interesting.  For example, demand for OSU resources from institutions in the Pacific Northwest seems to have waned somewhat — could this be a result of improved intra-regional inter-lending arrangements and courier service within the Orbis-Cascade group?  Less visible in these pictures, but no less intriguing (I think) is the decreasing range of geographies served by OSU between 2008 and 2012, which amounts to a reduction of 12%.  This trend holds up over a longer period, not reflected in the maps above.  What might account for this change?  Is demand being deflected to other suppliers? Or is the increasing volume of requests generated by ‘in-network’ CIC partners displacing fulfillment for non-CIC institutions?

There’s a lot more to explore in these complex inter-lending networks — and I suspect that visualizing flows between institutions and across geographies will become increasingly important in monitoring, analyzing and improving efficiency in the library system as a whole.  OCLC makes an increasingly wide range of data (including ILL policy and institution data) available for programmatic use by developers and others, and this will hopefully lead to more experimentation with visualization and library analytics.