Knowledge Management and Metadata

“The content of the scholarly record” by OCLC Research, from *The Evolving Scholarly Record* (doi: 10.25333/C3763V), CC BY 4.0

That was the topic discussed recently by OCLC Research Library Partners metadata managers, initiated by Roxanne Missingham of Australian National University and Suzanne Pilsk of the Smithsonian Institution. How can libraries contribute to the broader research management agenda in our institutions and nations? Research information infrastructure calls on many of the key strengths of the library profession. Metadata is fundamental to our complex research environment—beginning with the planning our researchers do before and during the creation of data; to managing the data; then to disseminating the knowledge gained; finally through to understanding the impact, engagement, and the resulting reputation of our home institutions.

Libraries’ expertise in metadata standards, identifiers, linked data and data sharing systems as well as technical systems can be valuable to the research life cycle. OCLC Senior Research Scientist Ixchel M. Faniel highlighted this value in the November 2019 Next blog post Let’s cook up some metadata consistency. Increasing pressure on resources in our institutions suggests that the time is ripe to explore new technologies and new opportunities to link up the data and publications in this complex world with minimal manual intervention.

Critical issues: The most common critical issue identified was that metadata specialists are rarely involved in the early stages of the research life cycle. Getting researchers to understand the importance of data management plans has been a challenge. Integrating workflows between repositories and the discovery layers has been problematic, along with a lack of “metadata governance” across the institution. The disparity of metadata schemas across disciplines represents a hurdle in institutions’ discovery layers. Aggregating the metadata of research data may be a sensitive issue to some researchers, with people concerned about being “spied on” by their administration and that an individual’s productivity would be compared to others in promotion or tenure reviews. This sensitivity is reported more often from those in the U.S. than in the other countries that have national assessments that determine funding for future research.

Integrating libraries into research workflows: Some metadata specialists work with their institutions’ Scholarly Communications and Publishing Division which also manages the Institutional Repository. These institutional repositories may have only the “citation” or “metadata-only” records, with a link to the full text or data set deposited in a disciplinary repository. “Metadata consultation services” may be provided to advise on the data management plan, which includes appropriate metadata standards and controlled vocabularies, a strategy to effectively organize their data, and an approach that will facilitate reuse of the data years after the research is completed. Communication is key for researchers to understand the importance of metadata throughout the research life cycle. Some universities offer “research sprints” where researchers partner with a team of expert librarians which may include metadata creation, management, analysis, and preservation. The “Shared BigData Gateway for Research Libraries,” hosted by Indiana University and partially funded by the Institute of Museum and Library Services, is developing a cloud-based platform to share data and expertise across institutions, including datasets such as records from the U.S. Patent and Trademark Office and the Microsoft Academic Graph. (For more details, watch the recording of the August 2019 OCLC Research Works in Progress Webinar, Democratizing Access to Large Datasets through Shared Infrastructure.)

Improving the metadata provided by publishers: Everyone wishes for better metadata from publishers and the metadata harvested from aggregators such as CrossRef, SCOPUS and Web of Science, and particularly that this metadata include identifiers, especially for authors. Authors who change names or have different forms of names can affect their rankings if their output is split, the “second name problem” that identifiers would solve. The British Library has been working with six major UK publishers to embed International Standard Name Identifiers using machine processing to associate the publishers’ propriety identifiers for authors with their associated ISNI and NACO identifiers and sending the results to the publishers to include in their ONIX data, but not all publishers are able to ingest the data. Metadata that includes DOIs would also resolve the problem of having two or more different titles for the same work. In the United Kingdom, a Jisc initiative, Publications Router, is bringing together publishers and content providers to provide article metadata in suitable formats for ingest into repositories or research information management systems and help institutions comply with the open access policies of research funding bodies. Another Jisc initiative, Plan M (“M” stands for “metadata”), seeks to streamline the metadata supply model among libraries, publishers, data suppliers, and infrastructure providers.

Potential for Artificial Intelligence or machine-learning: Metadata managers hope that Artificial Intelligence—or at least machine-learning—could mitigate the amount of effort currently done manually to link names and concepts in research data. Perhaps algorithms could be used to match names based on related metadata or sources; relate topics to each other based on context; disambiguate names based on other metadata available; analyze datasets to identify possible biases in a collection. Some high-level recommendations on enhancing descriptions at scale and improving discovery are noted in Thomas Padilla’s OCLC Research 2019 position paper, Responsible Operations: Data Science, Machine Learning, and AI in Libraries.

Karen Smith-Yoshimura

Karen Smith-Yoshimura, senior program officer, topics related to creating and managing metadata with a focus on large research libraries and multilingual requirements. Karen retired from OCLC November 2020.