Advice from linked data implementers

Linking Open Data cloud diagram 2017, by Andrejs Abele, John P. McCrae, Paul Buitelaar, Anja Jentzsch and Richard Cyganiak. http://lod-cloud.net/ by CC-BY-SA license.

The 2018 International Linked Data Survey for Implementers (announced in this 17 April 2018 HangingTogether blog post) attracted responses from 81 institutions in 20 countries. Together they described 104 linked data projects or services that have been or are being implemented.

While I’m still analyzing the responses, I thought I’d share the advice respondents offered. The advice overlaps what I reported from the 2014 survey responses four years ago. In the comments on what they would do differently if they started their linked data project again, I saw more comments regarding the need to integrate linked data into existing services, remediation for legacy metadata, and expectations, such as:

Linked data is now more mature; we would therefore have a wider frame of reference. We would seek to develop a system more integrated with our core services.
Embed it better in our existing infrastructure. Take a more holistic perspective and try to incorporate the project more into existing procedures.
Do more data enhancement at the time of conversion.
The clean up of data sets would have benefitted from wider organizational support.
The check and correction of the catalog’s data needed more staff dedicated in it.
The group of data on which the project would be carried out would be more consistent and we would try to give more visibility to the results.
Maybe we would try other reconciliation services, i.e., beyond OpenRefine.
I would set realistic expectations and make sure that people didn’t only focus on Google search results, but also understand how the data is used directly.
We are still figuring out the role of bibliographic Linked Data within our organization and its various processes. However, creating this service has been a major milestone and helped refocus the discussion by making it plain and clear what the possibilities (and limitations) are.

Advice for those considering a linked data project/service to consume linked data included:

Make sure the linked open data (LOD) provider(s) are committed to ensuring their data is persistent. A surprising number of established institutions still treat publishing LOD as a short-term experiment, and even if they are the authoritative source for that data, it doesn’t mean they are committed to making it available in the long term.
Make sure of the quality of the data group with which you intend to work and start with a small project that can be assumed by the technicians of the institution.
Do not focus on technical stuff, but on what you want to achieve. The technical stuff (including data model and ontology choices) should be tailored to your goals, not the other way around.
There are many more exemplars of good practice and more online information relating to consuming linked data. Our advice would be to read as widely as possible and consult with experts in the community.
Match use cases to what the data will provide. Be prepared to work with a number of data models.
Read W3C recommendations for Linked Data Best practices and recipes.
Have a very accurate idea of the information you want to provide to users and where to obtain it. Design interfaces that present linked data in a clear way. Analyze the available value vocabularies, the level of updating and the reliability of the institutions that maintain them. Select the most productive sources in terms of data and number of links with other LOD resources (e.g., VIAF, Wikidata, DBpedia…).
Never underestimate the amount of data cleanup that will be required.
It takes a lot more time than you think to disambiguate concepts; also, good to think about acceptable levels of quality prior to starting.
Check out if Wikidata covers your needs and contribute to it.
Find a very talented functional lead, who can define the scope and requirements before any work is begun; this ensures that there is a) a point to what is being done and b) that what is to be done is very clear for all project participants.
It will take time, and scale is required before you start to see benefits.
Talk to those that are already working so we can join projects and resources and ideas.
Clean and structure your data, disambiguate entities, use authority lists.

Advice for those considering a linked data project/service to publish linked data included:

Try to use existing ontologies when possible rather than creating your own. Create your own only to fill gaps.
It’s important to understand naming conventions and who is responsible for minting authoritative URIs. Projects often mint their own URIs for everything, when another resource may have already created some or all of the authoritative LOD which should be used instead.
Provide more publicity and visibility to the results obtained and how users can get a better experience with them in their searches.
First focus on what you want to achieve. Then see if linked data is the adequate means to achieve this.
Plan to measure the impact as early as you can, and ensure that you have some early users lined up to benefit from the work.
It’s not only an IT project. Communicate, communicate, communicate!
Check quality and granularity of legacy data. Ensure that attention is paid to the needs of users. Analyze the way to interrelate the data to help users to extract new navigation possibilities. Increase the presence of bibliographic data in Wikipedia and Wikidata.
Get institutional backing as high as possible. Find others and form a consortium before starting.
Understand the institutional goals of the project; develop both internal and external use cases; seek feedback from external users and listen to it; use existing predicate ontologies; analyze the legacy data to determine what should be converted to linked data—don’t just convert everything and don’t force the legacy data model into RDF (square hole/round peg).
Don’t let perfect prevent good. It is OK to iterate. Plan carefully, focusing always on what you need to accomplish. It’s easy to get sidetracked and go down black holes: keep refocusing on the primary needs and results required. Let those determine how you model your work. Have a clear sense of the long-term plan for integration support and ongoing maintenance.
Dedicate sufficient time and staff to your project, with preferably at least one person who can work on the project in a (close-to-full-time) capacity. To the extent possible, integrate work on linked data projects into your daily workflow instead of treating it as an “extra” on top of your other work.

Some advice struck me as relevant to whether your project is consuming or publishing linked data. A number of implementers recommend learning from others as much as you can before starting your own project, and encourage collaboration. Several advocated to “dive in!” or “just do it”. One noted,

“This is the future of data for libraries and the longer we wait the further behind we’re going to fall.”

Karen Smith-Yoshimura

Karen Smith-Yoshimura, senior program officer, topics related to creating and managing metadata with a focus on large research libraries and multilingual requirements. Karen retired from OCLC November 2020.