Linked Data Survey results 3–Why and what institutions are consuming (Updated)

 

LOD_Cloud_Diagram_as_of_September_2011 wikimedia.orgOCLC Research conducted an international linked data survey for implementers between 7 July and 15 August 2014. This is the third post in the series reporting the results. 

The two main reasons why the 72 linked data projects/services described in the survey that consume linked data are: to enhance their own data by consuming linked data from other sources (37) and provide a richer experience for users (35). Other reasons, in descending order: more effective internal metadata management; to experiment with combining different types of data into a single triple store; heard about linked data and wanted to try it out using linked data sources; a wish for greater accuracy and scope in search results; to improve Search Engine Optimization (SEO); and to meet a grant requirement.

The ways projects/services are using linked data sources (in order of the most frequently cited):

  • Enrich bibliographic metadata or descriptions
  • Interlinking
  • As a reference source and to
  • Harmonize data from multiple sources
  • Automate authority control
  • Enrich an application
  • Dataset discovery
  • Auto-suggest

The linked data sources that are used the most:

  • id.loc.gov – 30
  • DBpedia – 25
  • GeoNames – 25
  • VIAF – 24

Here’s the alphabetical list of the sources used; those that include uses by FAST, VIAF, WorldCat.org and WorldCat.org Works are asterisked.

 Source # of Projects FAST VIAF WorldCat.org WorldCat.org Works
British National Bibliography  3       *
Canadian Subject Headings  2
DBpedia 25    *
Dewey Decimal Classification  5       *
DPLA  4
Europeana  5
FAST 11       *       *
GeoNames 25    *
Getty’s AAT 10
id.loc.gov 30    *    *       *       *
ISNI  5    *
ORCID  5
RDF Book Mashup  1
The European Library  5
VIAF 24    *       *       *
Wikidata 7
WorldCat.org 12    *       *       *
WorldCat.org Works 6    *       *       *
Other 20

The other linked data sources consumed include:

  • Bibliothèque nationale de France’s data.bnf.fr, an aggregation of its catalogs and the Galica digital library.
  • Deutsche National Bibliothek’s Linked Data Service
  • GEMET, GEneral Multilingual Environmental Thesaurus
  • Heritage Data’s SENESCHAL (Semantic ENrichment Enabling Sustainability of arCHAeological Links), a set of linked data vocabularies for cultural heritage
  • HISCO, History of Work Information System
  • Hispana, an aggregation of digital collections of archives, libraries and museums from Spanish digital repositories
  • Lexvo for languages
  • Logainm.ie, place names database of Ireland
  • Nomisma.org, providing URIs for concepts unique to numismatics
  • Pleiades Gazetter of Ancient Places, a community-built gazetteer and graph of ancient places in the Greek and Roman world.
  • Rådata nå!, Norwegian name authority file, one of the first to be available as linked open data.
  • United Nation’s Food and Agriculture Organization’s AGROVAC 

Asked whether there were other data sources the respondent wished were available as linked data but isn’t yet, respondents noted:

  • More authority files or thesauri (requested by several) or multilingual subject vocabulary
  • [U.S.] Federal agencies’ data
  • Grant data
  • Individual artworks and digital objects from archaeological or museum databases
  • OpenStreetMap
  • Researcher identifiers from smaller data stores

Barriers or challenges encountered in using linked data resources included:

  • Size of RDF dumps; volatility of data formats of dumps; lack of availability of dumps; lack of authority control within the dumps; issues with level of specificity in terms of trying to match concepts.
  • What is published to the Internet as Linked Data is not always reuseable…Linked data without context is almost useless.
  • Many services present like Linked Data aren’t really Linked Data.
  • It’s difficult to get other institutions to do their own harmonization between objects and concepts.
  • Lots of handcrafting at the moment, not many off the shelf tools that are useful for visualisation.
  • Mapping of vocabulary requires a lot of manual work.
  • Matching, disambiguating and aligning source data and the linked data resources.
  • Not all resources that we would like to use as linked data are represented as URIs. Semantics that can represent library bibliographic data are not established yet.
  • It always requires time to understand how the data are structured before using it.
  • Disambiguation of terms across different languages is difficult.
  • DBpedia resources are not stable. URIs and structure for resource description would change.
  • The creation of controlled vocabularies in SKOS seems less intuitive then we’d like.
  • Service reliability has been a factor with some resources.
  • Unstable endpoints, datasets not being updated.

[Originally posted 2014-09-01, updated 2014-09-04]

Coming next: Linked Data Survey results-Why and what institutions are publishing

 

Tweet about this on TwitterShare on TumblrShare on LinkedInShare on FacebookBuffer this pageShare on Google+Email this to someone

About Karen Smith-Yoshimura

Karen Smith-Yoshimura, program officer, works on topics related to renovating descriptive and organizing practices with a focus on large research libraries and area studies requirements.

One Comment

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>