OCLC Research conducted an international linked data survey for implementers between 7 July and 15 August 2014. This is the third post in the series reporting the results.
The two main reasons why the 72 linked data projects/services described in the survey that consume linked data are: to enhance their own data by consuming linked data from other sources (37) and provide a richer experience for users (35). Other reasons, in descending order: more effective internal metadata management; to experiment with combining different types of data into a single triple store; heard about linked data and wanted to try it out using linked data sources; a wish for greater accuracy and scope in search results; to improve Search Engine Optimization (SEO); and to meet a grant requirement.
The ways projects/services are using linked data sources (in order of the most frequently cited):
- Enrich bibliographic metadata or descriptions
- Interlinking
- As a reference source and to
- Harmonize data from multiple sources
- Automate authority control
- Enrich an application
- Dataset discovery
- Auto-suggest
The linked data sources that are used the most:
- id.loc.gov – 30
- DBpedia – 25
- GeoNames – 25
- VIAF – 24
Here’s the alphabetical list of the sources used; those that include uses by FAST, VIAF, WorldCat.org and WorldCat.org Works are asterisked.
Source | # of Projects | FAST | VIAF | WorldCat.org | WorldCat.org Works |
British National Bibliography | 3 | * | |||
Canadian Subject Headings | 2 | ||||
DBpedia | 25 | * | |||
Dewey Decimal Classification | 5 | * | |||
DPLA | 4 | ||||
Europeana | 5 | ||||
FAST | 11 | * | * | ||
GeoNames | 25 | * | |||
Getty’s AAT | 10 | ||||
id.loc.gov | 30 | * | * | * | * |
ISNI | 5 | * | |||
ORCID | 5 | ||||
RDF Book Mashup | 1 | ||||
The European Library | 5 | ||||
VIAF | 24 | * | * | * | |
Wikidata | 7 | ||||
WorldCat.org | 12 | * | * | * | |
WorldCat.org Works | 6 | * | * | * | |
Other | 20 |
The other linked data sources consumed include:
- Bibliothèque nationale de France’s data.bnf.fr, an aggregation of its catalogs and the Galica digital library.
- Deutsche National Bibliothek’s Linked Data Service
- GEMET, GEneral Multilingual Environmental Thesaurus
- Heritage Data’s SENESCHAL (Semantic ENrichment Enabling Sustainability of arCHAeological Links), a set of linked data vocabularies for cultural heritage
- HISCO, History of Work Information System
- Hispana, an aggregation of digital collections of archives, libraries and museums from Spanish digital repositories
- Lexvo for languages
- Logainm.ie, place names database of Ireland
- Nomisma.org, providing URIs for concepts unique to numismatics
- Pleiades Gazetter of Ancient Places, a community-built gazetteer and graph of ancient places in the Greek and Roman world.
- Rådata nå!, Norwegian name authority file, one of the first to be available as linked open data.
- United Nation’s Food and Agriculture Organization’s AGROVAC
Asked whether there were other data sources the respondent wished were available as linked data but isn’t yet, respondents noted:
- More authority files or thesauri (requested by several) or multilingual subject vocabulary
- [U.S.] Federal agencies’ data
- Grant data
- Individual artworks and digital objects from archaeological or museum databases
- OpenStreetMap
- Researcher identifiers from smaller data stores
Barriers or challenges encountered in using linked data resources included:
- Size of RDF dumps; volatility of data formats of dumps; lack of availability of dumps; lack of authority control within the dumps; issues with level of specificity in terms of trying to match concepts.
- What is published to the Internet as Linked Data is not always reuseable…Linked data without context is almost useless.
- Many services present like Linked Data aren’t really Linked Data.
- It’s difficult to get other institutions to do their own harmonization between objects and concepts.
- Lots of handcrafting at the moment, not many off the shelf tools that are useful for visualisation.
- Mapping of vocabulary requires a lot of manual work.
- Matching, disambiguating and aligning source data and the linked data resources.
- Not all resources that we would like to use as linked data are represented as URIs. Semantics that can represent library bibliographic data are not established yet.
- It always requires time to understand how the data are structured before using it.
- Disambiguation of terms across different languages is difficult.
- DBpedia resources are not stable. URIs and structure for resource description would change.
- The creation of controlled vocabularies in SKOS seems less intuitive then we’d like.
- Service reliability has been a factor with some resources.
- Unstable endpoints, datasets not being updated.
[Originally posted 2014-09-01, updated 2014-09-04]
Coming next: Linked Data Survey results–Why and what institutions are publishing
Karen Smith-Yoshimura, senior program officer, topics related to creating and managing metadata with a focus on large research libraries and multilingual requirements. Karen retired from OCLC November 2020.
I am working on a study to revamp our national IPO system. The information presented provides valuable insights to my work efforts.
4 Sept 2014 update: Statistics updated to reflect the responses from Research Libraries UK and The European Library.