Herbert's Adventures In Linking

The title of this post is my homage to another famous Belgian.

I have been posting from the 9th International Bielefeld Conference in Germany. In yesterday’s closing keynote, Herbert Van de Sompel gave a most unusual presentation. Preparing, on his return to the Los Alamos National Laboratory, for a six-month sabbatical, he used the occasion to review the work he and his various teams have done over the past 10 years or so – and bravely assessed the success or otherwise of the major various initiatives in which he has been involved – SFX, OpenURL, OAI-PMH, OAI-ORE and MESUR (not for the acronymically faint-hearted). Incidentally, the 10-year boundary was as much accident as design. With the exception of one slide (pictured) showing his various project clusters, he had not prepared a new presentation, but instead paced around in front of a succession of old ones – some looking pretty dated – displayed in fabulous detail on the gigantic screen in the Bielefeld Convention Centre main hall. With a plea for more work on digital preservation, he stated that he had discovered that those Powerpoint presentations which were more than 10 years old were no longer readable.

The SFX development work, done at the University of Ghent, has resulted in some 1,700 SFX servers installed worldwide, which link – at a conservative estimate – to some 3 million items every day. Less successful, in his view, was the OpenURL NISO standard. It took three years to achieve, and – despite his ambitious intentions at the time – is still used almost exclusively for journal article linking. Reflecting on this, he remarked that the library community finds it hard to get its standards adopted outwith the library realm.

Herbert was also ambivalent about OAI-PMH. The systemic change predicted at the time of its development has not happened, and may never happen. He remarked that ‘Discovery today is defined by Google’, and in that context PMH did not do a good job because it is based on metadata. Ranking is based on who points at you (see my earlier post on the Webometrics ranking). ‘No one points at metadata records’. But it still provides a good means of synchronising XML-formatted metadata between databases.

He feels that we are moving on from a central concern with journal articles in any case. ‘What do we care about the literature any more? It’s all about the data (and let’s make sure that the data does not go the way of the literature!)’. He offered some reflections on institutional repositories in passing. They are not ends in themselves (though often seem to be). There is a difference between their typical application in the US and in Europe. European libraries use them more for storing traditional academic papers – versions of the articles which appear in peer-reviewed journals. In the US, there is a tendency to use them for ‘all that other stuff’. They are relatively unpopulated due to the fact that authors find it hard to care once they have had the paper accepted by their intended journal. But the other problem is workflow. Most repositories require deposit procedures which are outwith faculty workflows. Worse – content is being deposited by faculty all over the web – on YouTube’s SciTV, on blogs, in flickr. They have no time left for less attractive hubs. We need a button with the simplicity and embeddedness of the SFX resolver button to be present in these environments before we will truly optimise harvesting of content into the repository. There is a challenge …

The ORE work learned lessons from PMH. PMH did not address web architecture primitives. That was why Google rejected the protocol. It did not fit with their URI-crawling world view. ORE therefore used the architecture of the web as the platform for interoperability.

As for the MESUR project, directed by his compatriot Johan Bollen, Herbert described it as ‘phenomenal’. MESUR took the view that citations as a measure of impact were appropriate for the paper-based world. But now we should assess network-based metrics (the best known of which is Google’s PageRank). A billion usage events were collected to test the hypothesis that network metric data contains valuable data on impact. The hypothesis, he believes, was proved correct. There is structure there, and the ability to derive usable metrics. Indeed, the correlations produced by MESUR reached the fairly radical conclusion that the citation analysis data we have been using for decades is an outlier when compared with network-based methods.

Overall then, more plus points than negatives. And not only was his audience not inclined to criticise, but he was urged to stay and complete his presentation even though it ran over his allotted time by about 20 minutes at the end of an intensive day. How many people in our profession could discuss their work with reference to so many iconic projects? He concluded with a simple message – which he had come to see clearly as he prepared this review: we do what we do in order to optimise the time of researchers. Some recent studies, such as the UK Research Information Network’s Activities, costs and funding flows in scholarly communications (discussed earlier in the conference by Michael Jubb, Director of RIN), and the more recent JISC report, Economic Implications of Alternative Scholarly Publishing Models: Exploring the costs and benefits, express researcher time in cash terms. It amounts to billions of pounds each year.

How much money has been saved and so made available for further research by the projects developed and overseen by Herbert and his colleagues? There is optimisation to be proud of.

John MacColl

2 Comments on “Herbert’s Adventures In Linking”

Pingback: v.23 #1 NISO IOTA: Improving OpenURLs Through Analytics, in Context « Against-the-Grain.com
Herbert Van de Sompel says:

February 6, 2009 at 3:46 pm

Thanks you for a very nice summary of my presentation. There’s just one thing I would like to comment upon: I don’t think it is “worse” that researchers post their content all over the place. Rather, I see it as a fact of life; it is their workflow. What I suggested is that we need to work from that reality, and need to find solutions to allow seamlessly pushing their stuff (or surrogates thereof) from wherever they submit to whichever IR they feel affiliated to. I think that technically that is not a hard problem to solve.

Comments are closed.