A national service for research data management in the UK?

Last Thursday I attended a meeting in London on the UK Research Data Service – a project which has until recently been a feasibility study, run jointly by RLUK and its IT services equivalent body, RUGIT, and funded by the Higher Education Funding Council for England with support from JISC. The consultancy was undertaken by Serco Consulting. Four universities participated as case studies – Bristol, Leicester, Leeds and Oxford (the latter two both RLG Partners – as is the LSE whose Library Director, Jean Sykes, has been the RLUK lead on this project). The day-long conference confirmed that the feasibility study had demonstrated the case that some sort of research data service is required for the UK, and described the next steps to be taken. Andy Powell covered the event in a live blog, and Chris Rusbridge, Director of the Digital Curation Centre, blogged about it in a number of entries here. As well as RLUK and RUGIT, also at the table are national bodies with complementary remits, the UK Data Archive at the University of Essex, the Science & Technology Facilities Council – a Research Council with a special brief to provide research facilities – the Research Information Network (which is not a network, but a research-conducting organisation) and the Digital Curation Centre.

There had been some anticipation of a grand announcement and the securing of many millions of pounds worth of funding to set up the UKRDS. This did not happen. In fact, although it is quite easy to make the case for a service which preserves and curates the data generated by research in the UK (or anywhere), it is much less easy to say how that work should be done, or even that a single new service should be set up to do it. And there is an interesting professional issue which surfaces in discussions on this topic in the UK, since the library community is now very familiar with national services funded and managed by JISC on behalf of the Funding Councils, and so automatically imagines services with national scale (a number of examples were described by Paul Hubbard and are given below). Creating such national services appeals also to administrators, and in a climate in which the UK is worried that it is beginning to lag behind other countries (the meeting heard from the Australian National Data Service and was reminded of the National Science Foundation’s DataNet initiative), there is an inclination to propose models which are well-defined as to responsibility and coverage, but are not necessarily tuned to the complexities of the problem. Thus the librarians’ model can seem inadequate to researchers who work directly with the data, and to some of those bodies (such as the Research Councils) who already fund data curation activity within domain-based models. Lorcan touched upon this issue recently, referring to a Chris Rusbridge post in the DCC blog, in the context of multi-scalar solutions. This meeting in some ways acted as a venue for a dialogue between these two camps – librarians and those who supported library support-like national services, on the one hand, and specialist researchers and their support services on the other.

Without a strong consensus on the need for the various potential stakeholders – JISC, the Funding Councils and the UK’s seven Research Councils – to concentrate all of their funding for this activity into one new service, what will emerge will be a pathfinder approach, undertaking data curation and preservation selectively to begin with, with an intention later to scale it up. Other universities have shown interest beyond the four English universities already on board (Edinburgh, Glasgow and Cardiff) and will join if the Scottish and Welsh Funding Councils decide to participate. None of the various players will host research data centrally; that responsibility will be distributed within the institutional environment. But the UKDA, STFC, RIN and DCC will somehow ‘sandwich’ the distributed data between a supportive bottom layer and an overarching presentational top layer (we were reminded that code is much cheaper to move than data).

For the library community, some of the discussion had a familiar ring to it from the discussions around the population of institutional repositories. We have learnt – somewhat painfully – from the experience of ‘building’ institutional repositories that saying (as more than one speaker at this event did) that we need to change the culture is a little like saying we need peace on earth. Devising training programmes for junior researchers will not in itself modify long-held behaviours. There was interesting discussion on where investment should be made to effect real change: not on researcher behaviours, but on system drivers (eg cranking up the reputational value of data may be assisted by measures such as increasing the length of time for completion of PhDs, so that researchers will be both more inclined and more able to cite the data of others, and to make their own data citable). The Wellcome Trust requires data management plans as a condition of grant funding, and demands that these are peer reviewed.

One of the most interesting presentations of the day was made (via video-link) by Dr Ross Wilkinson, Executive Director of the Australian National Data Service, which has been operating for only a few months, and is funded by the Australian Government’s Department of Innovation, Industry, Science and Research. Their approach is to make data discoverable both Google-style (‘you come to us’), and Amazon-style (‘we come to you’, ie via recommendations). The ANDS service also uses the currency of the search-engine discoverable web page for each collection and project in the system, with a variety of linked options available from pages.

Whatever else it seeks to do, this service is based on the need to accommodate discovery from web search engines. Ross said it bluntly: ‘forget portals; get pages findable in Google’. Their approach also includes the development of persistent identifiers and a Collections Registry. It is a model which seemed to possess the clarity of vision and get-ahead mentality which the UK national service advocates were striving for.

But the specialist researcher group were not fully convinced. Malcolm Atkinson (Director of the National eScience Centre) had earlier reminded us that Google and Amazon think carefully about the computational needs of those whose data they store. The Australian model passes that problem back to the research domains themselves. There was also a theological disagreement between Kevin Schurer, Director of the UK Data Archive – who argued for quality over quantity in data deposits – and Ross Wilkinson, who argued for web-findable inclusivity at the expense of high quality. Indeed, it seemed to me that he was essentially applying our Shifting Gears argument in relation to digitising special collections – or that of the seminal Greene-Meissner paper on archives, More product, less process – to research data – the first time I have seen the arguments applied in that context. Quality should be improved in response to demand, not in anticipation of it.

Towards the end of the day there was a bid to get the process moved on. Paul Hubbard, Head of Research Policy in HEFCE, maintained that the UK has an exceptional record in creating strong and efficient research information resources. He gave examples of various UK-wide services developed in recent years – the Research Support Libraries Programme which ran from 1999-2002, the system of designated (and specially funded) National Research Libraries, the British Library Document Supply Centre and its current manifestation as a suite of services, the UK Research Reserve, JISC Collections, JANET, the Digital Curation Centre and the Research Information Network. He could of course have pointed to some examples of national research support services which have struggled to make headway, or fallen by the wayside, and an analysis of what makes for success might be valuable for the UKRDS. Clearly, HEFCE is hoping that it will be supported and successful, and the use of these examples reveals the intention to establish it as a national, library support-like service. Malcolm Read, Executive Secretary of JISC, then spoke about possible funding models, and called for a new cadre of ‘data management professionals’ to be created, ‘in the same way that the library profession created ‘digital librarians”. They would presumably have the role which Kevin Schurer had earlier pointed to in professionalising certain aspects of research data management (whereas too often it is still the case that the Principal Investigator in a research project is data generator, publisher and distributor). A voice from the non-library camp pointed out that some Research Councils (NERC, specifically) already employ professionals with data management skills, to which Malcolm Read replied that the skills of librarians in this field were nonetheless highly appropriate.

Progress was made, but clearly there are issues still to be resolved in how the UK will tackle this new service need in a way which satisfies all of the many stakeholders. The pathfinder phase seems a sensibly cautious way to start work while keeping on listening to these several voices at the same time. If it results in an ANDS-like service, with a more product, less process ethos, and pathways to the specialist domain needs of the many communities of researchers with value-added requirements, it could increase its chances of eventually taking its place in the group of successful UK national research support services which represent the aspiration of the organisers of this event.

