Researcher profiles and the evolving scholarly record

Skeleton Keys
Skeleton Keys by Steven Depolo (Flickr Commons)

Via Twitter, I recently came upon an arXiv pre-print looking at the institutional distribution of highly-cited researchers, as identified by Thomson Reuters.  One of the issues the authors raise is the difficulty of disambiguating institutional affiliations, which can be complex (faculty appointment at several institutions, for example) or simply difficult to decode without some local context.  To simplify their analysis, the researchers collapsed some related institutions into single units.  All campuses of the University of California system, for example, are treated as the same organization.  As a result the ranked lists of highly-cited institutions favored large education systems (UC) that would not typically be compared to a single university (Stanford).  Even so, the analysis was quite revealing, inasmuch as it pointed up the fact that institutions that recruit top cited researchers to secondary, adjunct professorships gain dividends in international ranking schemes. The Times Higher Education picked up on this aspect of the story in a recent article.

In the couple of weeks since its original publication, the pre-print has gone through 6 versions.  It is quite nice how a venue like arXiv enables one to see how research publications evolve over time.  This is of course a key feature of networked scholarship – editorial work that used to be carried out in private is now much more visible, and disciplinary repositories expose us to a great deal more ‘upstream’ (and ‘downstream’) content than was visible in the past.  At the point when Isidro Aguillo referenced the publication on Twitter, it was in its first version, which is the one I read.  (I have just now quickly skimmed subsequent iterations, flip-book fashion, curious to see what had changed.)


Acknowledging that ISI/Web of Science citation metrics cover only part of the published literature, it is still interesting to consider what these kinds of lists can tell about the interplay of individual and institutional research reputation.  We have been looking recently at some changes in the ‘supply chain’ of scholarly publications [Evolving Scholarly Record framework] and what this may mean for library preservation efforts.  In this context, the Thomson-Reuters list (which is available for download) was especially interesting to me because it contains Thomson-Reuters name identifiers, called ResearcherIDs.  Scanning the list, it is immediately apparent that very few of the Web of Science highly-cited researchers are associated with a ResearcherID.  Of more than 3000 highly-cited researchers, fewer than 225 (about 7%) on the list are linked to a ResearcherID.  And because individual researchers can appear more than once on the list (under different disciplinary categories), the actual number of unique ResearcherIDs was even less — 215 in total.

Why would this matter to library stewardship? Because name identifiers are an important component in selecting and appraising content for preservation purposes, as well as measuring research impact.  University libraries are concerned primarily with archiving and making discoverable the research products of their local constituency. It is not uncommon to find two individuals in the same discipline with the same name — even at the same institution. As an increasing array of researcher profiling and publication harvesting systems rely on name identifiers to disambiguate researcher identities, identifier assignment and registration services become more important.  And, because ResearcherID is integrated with ORCID (an important hub for researcher name identifiers), it can function as a kind of skeleton key, unlocking other name identifiers.  Here is an example of a ResearcherID that has been associated with a related ORCID for the same individual:

Daniele Del Rio ResearcherIDHaving these two points of identification for Daniele Del Rio makes it easier to say with confidence what he has published, and makes it possible for preservation agents (libraries, archives, others) to make informed decisions about what to steward as part of the scholarly record.

One possible explanation for the apparently low uptake of ResearcherIDs among the highly-cited is that celebrity scientists see no advantage in registering their identity in the Thomson Reuters ‘network’ — after all, they are already highly visible according to traditional citation impact measures, or they wouldn’t be on the highly-cited list. Another explanation might be that the Thomson Reuters list only included ResearcherIDs for identities that had proven difficult to disambiguate in the process of producing a ranked list.  (Thomson Reuters does a nice job of documenting the methodology they used, but does not say anything specific about the role of ResearcherIDs.)

To investigate this, in a casual way, I manually searched the ResearcherID profiles site for the 159 highly-cited Geosciences authors to see if the ‘yield’ of ResearcherIDs was any higher than reported in the top-cited list.   Why Geosciences?  Because it is the disciplinary category with the highest proportion of ResearchIDs in the highly-cited list and hence appeared to be a population with some motivation to register author identities in this particular space. Result?  The proportion of highly-cited Geosciences authors with a ResearcherID rose from 18% to 41%.  One can’t read too much into this, of course, since it is possible that some authors created ResearcherID profiles (and claimed a ResearcherID in the process) after the highly-cited list was produced. Still, the difference between the 7% ResearcherID ‘coverage’ in the highly-cited list and the 41% coverage for Geosciences seemed noteworthy. Does it signify a particular fealty to ISI or are climate scientists (in the main, the specialty that dominates the Geosciences list) motivated to engage with research profiling in other venues too?

In a fairly desultory way, I looked into this too.  This time, I looked at researcher profiles and identifiers for two different groups:  the same 159 Geosciences authors, and a sample of 190 highly cited researchers in all disciplines. The second sample was not truly random:  I focused on a set of 7 research institutions that were of particular interest to me.  (More on that another time.)  For each author, I attempted to identify a valid ResearcherID, ORCID, and Google Scholar profile.  These seemed to me to represent a decent range of research profiling spaces and services, including proprietary schemes (ResearcherIDs), a freely available name identifier provider that is designed to interoperate with other providers (ORCID), and a ‘webscale’ profiling service (Google Scholar) that does not overtly promote the value of name identifiers.

Here’s what I found.

Sample 1 – highly-cited Geosciences researchers (n=159)

  • Per cent of researchers lacking any research profile (of 3 sources examined):  33%
  • Per cent of researchers with Google Scholar profile: 45%
  • Per cent of researchers with ResearcherID: 41%
  • Per cent of researchers with ORCID: 25%

Sample 2 – highly-cited researchers in 21 disciplines at 7 US & UK research institutions (n=190)

  • Per cent of researchers lacking any research profile (of 3 sources examined):  48%
  • Per cent of researchers with Google Scholar profile: 42%
  • Per cent of researchers with ResearcherID: 18%
  • Per cent of researchers with ORCID: 13%

Needless to say, a variety of caveats apply. I was reasonably scrupulous in my manual searches, used stemming and wild-cards judiciously, verified identities by checking publication details and institutional affiliation, etc.  But I may have missed a few ORCID matches — I disregarded any that were impossible to validate — and some of the institutional affiliations reported by Thomson Reuters no longer seem to be valid. After discovering that at least one of the reported ResearcherIDs in the highly cited list erroneously attributed Eugene Garfield‘s ID (!) to a prominent Chinese agricultural scientist, I went back and validated ResearcherIDs in the original list.  A handful of researchers had multiple Google Scholar profiles; I counted only one. Etc.

Imperfect as this casual investigation is, it does raise some issues that may be worth more serious investigation. More than half of the researchers in both samples have at least one publicly available research profile, and Google Scholar is clearly in the lead. What accounts for the comparatively low uptake for ResearcherID or ORCID profiles? Is it simply that, having established a ‘ubiquitous’ Google profile, scholars feel they are sufficiently visible? For the highly-cited, this would seem to be a reasonable conclusion.  Indeed, a quick look at Google suggests that 33% or more of the researchers in sample 2 are sufficiently notable to be recognized in Google’s knowledge graph.  A simple string search on the form of the name reported by Thomson Reuters retrieves ‘knowledge cards’ for many of these individuals.

Candes kcard

Coincidentally, the three ‘related names’ displayed here all feature on the highly-cited list.

There are some interesting disciplinary variations too. On balance, Geosciences researchers (or their agents…) appear to have a greater interest in managing research profiles in multiple environments than do other researchers; this is borne out in the high-cited list as a whole as well as the smaller sampling.  In other disciplines, it seems there is little deliberate attention (among highly-cited authors) to managing online reputation. Among researchers in Clinical Medicine — the discipline with the greatest number of highly cited authors — more than 60% of authors lack any profile among the three sources examined.  It is admittedly difficult to make sensible comparisons across disciplinary communities in the highly-cited sample, since they are of very different sizes.  And by its very nature, a citation ranking system that relies primarily on scientific journals will not fully represent the scholarly communication practices (and reputation management preferences) of disciplines that publish in other venues. This is of course well known.

Isidro Aguillo and his colleagues have just published an interesting article in Scientometrics, examining the institutional and social web presence of highly-cited European researchers.  It touches on some of the issues raised here, including some of the disciplinary differences in identity management.  Their major finding is that highy-cited researchers in Europe have a relatively limited presence on the social web, excluding LinkedIn and Microsoft Academic Search. There is apparently only limited engagement with services like Mendeley, Google Scholar, or  Having read their analysis, I regret (a  little 😉 ) that I didn’t include Mendeley and some other social research venues in my little investigation.  But no matter – my primary interest was in researcher identification services and the degree to which interest in claiming and managing online profiles is evidenced by highly-cited researchers.  And for that, I have a least a a preliminary answer: among those who have achieved a high degree of visibility in traditional scientific publishing venues, there is evidently little motivation to go beyond the ‘registration’ services of major search engines (Google).  In retrospect, I should obviously have included Microsoft Academic Search profiles — though, as they function as a kind of ‘auto-registration’ service, they are somewhat different in kind.

An upshot of this may be that libraries have an opportunity to insert themselves more systematically into researcher registration workflows.  An upcoming report by my colleague Karen Smith-Yoshimura will explore this topic in more detail.  See also this very nice profile of twenty Research Networking Systems — including ResearcherID, ORCID, and Google Scholar — compiled by Karen’s Registering Researchers team, which is now being updated.  There is also a helpful tabulation of related services in Wikipedia.

A final observation in what is already an overlong post… I have been surprised in my occasional perusal (…) of the ORCID registry to see individual profiles that lack any publication record.  ORCID provides some statistical reporting that confirms this – only about 20% of ORCID identifiers are associated with any published work in the registry.  Some may be individuals who are eager to have an identifier but don’t yet have any published works to associate with it.  But others are researchers with established publication records who apparently cannot be bothered to manually enter bibliographic metadata (I would not fault them there…), leaving a gap that the automatic ingest of publications that ORCID supports (through its integration with WoS ResearcherID, Scopus etc.) cannot yet fill.  It would be interesting to know more about the ‘workless’ ORCID profiles.  It would also be interesting to know more about >2M works that lack DOIs — presumably it is quite challenging to de-duplicate those entries, resulting in some over counting of registered works.