Making connections: Research analytics at Virginia Tech

In an earlier post, we talked about the emergence of research analytics on campus, and potential roles for the academic library in this area. Recently, the OCLC Research Library Partnership (RLP) Research Support Interest Group convened a discussion with colleagues at Virginia Tech University Libraries, where research analytics are helping the university cultivate industry research partners, as well as understand the characteristics and impact of the university’s research output.

Connie Stovall, Assistant Director for Strategic Research and Industry Analyst uses powerful data analysis and visualization tools to unlock potential synergies and partnerships between Virginia Tech researchers and their counterparts in industry and government. Connie’s goal is to compile and present actionable intelligence that keeps Virginia Tech researchers aware of developments in the research landscape external to the university, and vice versa. To do this, Connie gathers and analyzes data obtained from a variety of sources, ranging from patents, to corporate reports, to screen scrapes. She then summarizes and visualizes this data, as well as overlaying it with data on Virginia Tech’s research activity, to identify relationships, trends, and key individuals.

An important tool that Connie uses to do this work is a data visualization package called IN-SPIRE. Developed by the Pacific Northwest National Laboratory, IN-SPIRE can create visualizations of large sets of unstructured, textual data – just the kind of data that Connie typically works with. In the example below from Connie’s presentation, we see clusters of dots, each representing a document. The orange dots represent public documents pertaining to a Virginia-based company interested in collaborating with Virginia Tech researchers; the grey dots represent scholarly outputs from Virginia Tech. Overlaying the two sets of dots on one another reveals a strong coincidence of documents in the area of “Vehicles, Drivers, Visual” – within this area, Virginia Tech has a particularly strong cluster of documents pertaining to “Energy, Power, and Vehicles”. This data opens a window into possible synergies between the company’s interests – which are in autonomous vehicles – and Virginia Tech’s research strengths.

*Image from Connie Stovall’s presentation “Research Analytics and Competitive Intelligence: Link+License+Launch”, OCLC RLP Research Support Interest Group discussion sessions, March 25-26, 2020*

We also heard from Connie’s colleague Rachel Miles, Research Impact Librarian at Virginia Tech. Like Connie, Rachel’s work involves utilizing data analysis and visualization tools, but this time with a focus on understanding and communicating the impact of Virginia Tech’s research output. Analyses of interest include citation maps, alt-metrics, and co-authorship networks, among others. For her work, Rachel utilizes a variety of data sets and tools; the example below is a term co-occurrence map produced using VOSviewer, an open-source data analysis tool for building bibliometric networks. Developed by the Centre for Science and Technology Studies at Leiden University, VOSviewer utilizes data sets and APIs from a variety of sources (e.g., Scopus, Web of Science, Crossref, WikiData) to construct detailed visualizations of networks of scholarly outputs.

*Image from Rachel Miles’s presentation “Visualizing Research”, OCLC RLP Research Support Interest Group discussion sessions, March 25-26, 2020*

In this example, Rachel has constructed a term co-occurrence map using publications from the Virginia Tech College of Veterinary Medicine’s Department of Large Animal Clinical Sciences. In this visualization, each node represents a cluster of publications corresponding to a particular term co-occurrence in those publications’ titles and abstracts, such as “protein” or “foal”. The lines between the nodes denote an instance of two terms co-occurring in the same publication. Thick lines between nodes indicate a pair of terms that co-occur frequently in the same publications, and are therefore likely related. Finally, the nodes and lines are colorized according to the average date of publication of the documents involved. For example, the visualization indicates that some of the latest Virginia Tech research in large animal clinical sciences is concentrated around topics having to do with reproduction in cows (lower left of the figure, in yellow).

In the discussion following the presentations, Connie and Rachel shared some important insights about undertaking research analytics work. One point they stressed was that about 80 percent of their work involves data cleaning: good data analysis requires good data. They also talked about the diverse nature of the data analysis requests they receive from around campus: no “one size fits all” data report meets all needs. And these requests come through a variety of channels: for example, a liaison librarian may hear of an analytical need on the part of an administrator at a department, school, or institute and then follow up with Rachel or Connie. Sometimes the connections are more indirect. Rachel noted that occasionally a Google search for “research impact” and “Virginia Tech” leads administrators and faculty to her door.

Although quantitative measures are useful for understanding the nature, evolution, strengths, and impact of a university’s research enterprise, it is important to remember that there is always more to the story. Rachel pointed out that we are not able to completely understand research impact from data analysis alone: instead, it must be complemented by talking to faculty and doing qualitative evaluation as well. For example, in developing a publication strategy, we must look beyond quantitative measures such as journal impact factors and try to understand faculty journal preferences in different fields: why do they have these preferences, and how do they advance their scholarly interests? Rachel also noted the relevance of Goodhart’s Law: when a measure becomes a target, it ceases to be a good measure, as stakeholders optimize their behaviors with respect to the measure, and in doing so, run the risk of losing sight of broader, more important objectives.

Connie noted that an important aspect of research analytics is uncovering areas of excellence or strengths within the university’s sprawling landscape of research activities. Uncovering these strengths and finding new ways to talk about them helps promote the university’s brand and reputation. It is also helpful in positioning the university in inter-institutional comparisons or rankings. However, Rachel cautioned that while external evaluations like rankings are often inescapable (and flawed), it is important not to lose sight of the university’s unique mission or goals: identify what those are, and find ways to highlight them in compelling ways.

One theme that seemed to resonate in Connie and Rachel’s work is the importance of making connections. We saw this in a number of contexts. For example, Rachel described how she makes a point of meeting in person with consumers of her work – administrators, department heads, etc. – to understand what their goals and values are before beginning the analysis, and to ensure that they understand what the analysis says, and importantly, the caveats attached to its interpretation (see for example SCOPE, a process for responsible research evaluation). Connie talked about how her work helped identify networks of Virginia Tech professors whose work would be of interest to industry partners, and in doing so, make potentially valuable connections between Virginia Tech’s research enterprise and other research sectors. And Connie and Rachel even talked about the value of connecting with each other, leveraging their respective skills and strengths to help each other when they can.

Many thanks to Connie and Rachel for sharing their expertise with us! These discussions were part of the ongoing activities of the RLP Research Support Interest Group, in which RLP members share their knowledge and insights with colleagues at member institutions on topics in the growing and dynamic area of research support services. If you are affiliated with an RLP member institution and would like to join the Research Support Interest Group, please click here to sign up!

Brian Lavoie

Brian Lavoie is a Research Scientist in OCLC Research. He has worked on projects in many areas, such as digital preservation, cooperative print management, and data-mining of bibliographic resources. He was a co-founder of the working group that developed the PREMIS Data Dictionary for preservation metadata, and served as co-chair of a US National Science Foundation blue-ribbon task force on economically sustainable digital preservation. Brian’s academic background is in economics; he has a Ph.D. in agricultural economics. Brian’s current research interests include stewardship of the evolving scholarly record, analysis of collective collections, and the system-wide organization of library resources.