That was the topic discussed recently by OCLC Research Library Partners metadata managers, initiated by Corey Harper of New York University and Stephen Hearn of University of Minnesota. They had posited that in an environment more oriented toward search than toward browse indexing, new kinds of services will rely on non-bibliographic data, usage metrics and data analysis techniques. Metrics can be about usage data—such as how frequently items have been borrowed, cited, downloaded or requested—or about bibliographic data—such as where, how and how often search terms appear in the bibliographic record. Some kinds of use data are best collected on a larger scale than most catalogs provide.
These usage metrics could be used to build a wide range of library services and activities. Among the possible services noted: collection management, identifying materials for offsite storage, deciding which subscriptions to maintain, comparing citations for researchers’ publications with what the library is not purchasing; improving relevancy ranking, personalizing search results, offering recommendation services, measuring impact of library usage on research or student success. What if libraries emulated Amazon with “People who accessed <this title> also accessed <these titles>” or “People in the same course as you are accessing <these titles>”?
Harvard Innovation Lab’s StackLife aggregates such usage data of library titles as number of check-outs (broken down by faculty, undergraduates and graduate students, with faculty checkouts weighted differently), number of ILL requests, and frequency the title is placed in course reserves, and then assigns a “Stack Score” for each title. A search on a subject then displays a heat map graphic with the higher scores shown in darker hues, as shown in the accompanying graphic, and can serve as a type of recommender service. The StackLife example inspired other suggestions for possible services, such as aggregating the holdings and circulation data across multiple institutions—or even across countries—with Amazon sales data, and weighting scores if the author was affiliated with the institution. A recent Pew study found that personal recommendations dominated book recommendations. Could libraries capture and aggregate faculty and student recommendations mentioned in blogs and tweets?
The University of Minnesota conducted a study[i] to investigate the relationships between first-year undergraduate students’ use of the academic library, academic achievement, and retention. The results suggested a strong correlation between using academic library services and resources—particularly database logins, book loans, electronic journal logins, and library workstation logins— and higher grade point averages.
Some of the challenges raised in the focus group discussions included:
Difficulties in analyzing usage data: The different systems and databases libraries have present challenges in both gathering and analyzing the data. A number of focus group members are interested in visualizing usage data, and at least a couple are using Tableau to do so. Libraries have with difficulty harvested citations and measure which titles are available in their repositories, but it is even more difficult to demonstrate which resources would not have been available without the library. The variety of resources also mean that the people who analyze the data are scattered across the campus in different functional areas. Interpreting Google analytics to determine patterns of usage over time and the effect of curricula changes is particularly difficult.
Aggregating usage data across campus: Tools that allow selectors to choose titles to send to remote storage by circulation data and classification range (to assess the impact on a particular area of stacks) can be hampered when storage facilities use a different integrated library system.
Anonymizing data to protect privacy: Aggregating metrics across institutions may help anonymize data but hinders analysis of performance at an individual institution. Anonymizing data may also prevent usage metrics by demographics (e.g., professors vs. grad students vs. undergraduates). Even when demographic data is captured as part of campus log-ins, libraries cannot know the demographics of people accessing their resources who are not affiliated with their institution.
Difficulties in correlating library use with academic performance or impact: Some focus group members questioned whether it was even possible to correlate library use with academic performance. (“Are we busting our heads to collect something that doesn’t tell us anything?”) On the other hand, we can at least start making some decisions based on the data we do have, and perhaps libraries’ concern with being “scientific” is not warranted.
Data outside the library control: Much usage data lies outside the library control (for example, Google Scholar and Elsevier). Only vendors have access to electronic database logs. Relevance ranking for electronic resources licensed from vendors is a “black box”.
Inconsistent metadata: Inconsistent metadata can dilute the reliability of usage statistics. Examples cited included: the same author represented in multiple ways; varying metadata due to changes in cataloging rules over time; different romanization schemes used for non-Latin script materials. The biggest issue is that most libraries’ metadata comes from external sources and thus the library has no control over its quality. The low quality of metadata for e-resources from some vendors remains a common issue; misplaced identifiers for ebooks was cited as a serious problem. Focus group members have pointed vendors to the OCLC cross-industry white paper, Success Strategies for Electronic Content Discovery and Access without much success. Threats to cancel a subscription unless the metadata improves prove empty when their selectors object. Libraries do some bulk editing of the metadata, for example: reconciling name forms with the LC name authority file (or outsource this work); adding language and format codes in the fixed fields. The best sign of a “reliable vendor” is that they get their metadata from OCLC. It’s important for vendors to view metadata as a “community property.”
[i] Krista M. Soria, Jan Fransen, Shane Nackerud. Stacks, Serials, Search Engines, and Students’ Success: First-Year Undergraduate Students’ Library Use, Academic Achievement, and Retention. Journal of Academic Librarianship, 40 (2014), 84-91. doi:10.1016/j.acalib.2013.12.002
Karen Smith-Yoshimura, senior program officer, topics related to creating and managing metadata with a focus on large research libraries and multilingual requirements. Karen retired from OCLC November 2020.