Introducing a RIM System Framework

This blog post is jointly authored by Rebecca Bryant from OCLC Research and Jan Fransen from the University of Minnesota Libraries, and is part of a series based upon the OCLC Research Information Management in the US reports..

The recent OCLC Research report series Research Information Management in the United States provides a first-of-its-kind documentation of research information management (RIM) practices at US research universities, offering a thorough examination of RIM practices, goals, stakeholders, and system components. This effort builds upon previous research conducted by OCLC Research, including the 2018 report Practices and Patterns in Research Information Management: Findings from a Global Survey, prepared in collaboration with euroCRIS.

The reports document the history, use cases, scope, stakeholders, and administrative leadership at five case study institutions:

Penn State University
Texas A&M University
Virginia Tech
UCLA
University of Miami.

A major contribution of these reports is the introduction of a RIM system framework. This model visualizes the functional and technical components of a RIM system through subdivision of RIM processes into three discrete segments:

Data sources
Data processing (including storage)
Data consumers.

RIM System Framework

The RIM System Framework is intentionally shaped like an hourglass, representing the funnel of information into a core RIM system data store and then out again in service of institutional business uses. The three discrete sections are color coded.

Data Sources

Data sources exist at the top of the funnel and refers to the information that must be collected from outside the RIM system from both external and internal sources, and may include things like:

Person names and job title(s)
Publications
Patents
Grants and projects
Equipment
Institutional units and their hierarchical relationships
Instructional history
Statements of impact.

The framework identifies three types of data sources:

Publication databases and indexes are used as a data source about research outputs. These sources may be freely available (e.g., PubMed) or databases licensed through the institution’s library (e.g., Scopus or Web of Science).
Local data sources may include human resources, sponsored research, and student information systems. They hold information about employees and their job titles, grants awarded by external funding agencies, and instructional history, including courses taught.
Some information does not reside in existing databases; this local knowledge, such as statements of impact, will require manual entry. Organizational relationships and unit hierarchies, perhaps surprisingly, often fall into the local knowledge category, as data about institutional unit hierarchies can be elusive, incomplete, heterogeneous, mutable—and often completely unlinked to the people affiliated with these units.

Data Processing

The data processing section of the RIM System Framework documents how the information from the data sources is captured, transformed, and stored for later use. This constitutes the center of the model, including not only the main RIM data store in the middle but also the processes above it—the publication harvester, ETL processes, and metadata editor—that enable the transfer, cleaning, and enrichment of metadata into the RIM data store. Below the data store, it also includes the data transfer methods used to export the data in support of the various RIM use cases.

A publication harvester allows the regular and automated updates of publications authored by researchers in a RIM system, drawing content from one or more publication databases such as PubMed, Scopus, or others.
ETL processes stands for Extract, Transform, Load, and is a general term for computing processes that take data from one source, clean and crosswalk as needed, and add or merge the data into a target database.
The metadata editor is the interface that allows users to create, read, update, and delete information. This includes processes like the claiming/disclaiming of publications suggested by the publication harvester, importing publications from publication databases not captured by the publication harvester, and adding and maintaining data available only as “local knowledge.”
The data store is the main database where the aggregated data is maintained. It may be part of a licensed product or might be a bespoke database developed and maintained by the institution.
In order to use the data stored in a RIM system, there must be data transfer methods for extracting it. These typically take the form of APIs, but some RIM systems also allow data analysts to query the database directly using SQL.

Data Consumers

Once the data has been collected and transformed, it can be used to support one or more of the six RIM use cases identified in the reports:

Faculty activity reporting
Public portals
Metadata reuse
Strategic reporting & decision support
Open access workflows
Compliance monitoring

More details about these use cases are offered in a previous blog and in the reports themselves.

Using the RIM System Framework

The report authors developed this framework as a necessary aid in comparatively understanding the functional and technical components documented at the five case study institutions. The model can help demonstrate the different institutional decisions—and the array of options available to RIM system implementers.

For example, here’s the framework to describe Virginia Tech’s implementation of Symplectic Elements, which uses the Elements product to support many of the system components, including metadata harvesting, as a data store, and (soon), as a public portal.

*RIM System Framework for Virginia Tech*

In comparison, Texas A&M likewise uses Elements but only as a metadata harvester, instead utilizing a MySQL database for the data store and the open source VIVO product for the Scholars@TAMU public portal front end.

Use the RIM System Framework at YOUR Institution

We invite you to take a closer look at the model as introduced and applied in the RIM in the US report series and consider how this model may apply to your local system(s). We’d love to hear from you about how this works—please share in the comments below.

Rebecca Bryant, PhD, serves as Senior Program Officer at OCLC Research where she leads investigations into research support topics such as research information management (RIM). Janet (Jan) Fransen is the Service Lead for Research Information Management Systems at University of Minnesota Libraries. In that role, she works across divisions and with campus partners to provide library systems and data that save researchers, students, and administrators time and highlights the societal and technological impacts of the University’s research. The most visible system in her portfolio is Experts@Minnesota.

Rebecca Bryant

Rebecca Bryant, PhD, previously worked as a university administrator and as community director at ORCID. Today she applies that experience in her role as Senior Program Officer with the OCLC Research Library Partnership, conducting research and developing programming to support 21st century libraries and their parent institutions.