SpEnD: Linked Data SPARQL Endpoints Discovery Using Search Engines
Yumusak, Semih and Dogdu, Erdogan and Kodaz, Halife and Kamilaris,
Andreas and Vandenbussche, Pierre-Yves
Loading
Abstract
Linked data endpoints are online query gateways to semantically
annotated linked data sources. In order to query these data sources,
SPARQL query language is used as a standard. Although a linked data
endpoint (i.e. SPARQL endpoint) is a basic Web service, it provides a
platform for federated online querying and data linking methods. For
linked data consumers, SPARQL endpoint availability and discovery are
crucial for live querying and semantic information retrieval. Current
studies show that availability of linked datasets is very low, while the
locations of linked data endpoints change frequently. There are linked
data respsitories that collect and list the available linked data
endpoints or resources. It is observed that around half of the endpoints
listed in existing repositories are not accessible (temporarily or
permanently offline). These endpoint URLs are shared through repository
websites, such as Datahub. io, however, they are weakly maintained and
revised only by their publishers. In this study, a novel metacrawling
method is proposed for discovering and monitoring linked data sources on
the Web. We implemented the method in a prototype system, named SPARQL
Endpoints Discovery (SpEnD). SpEnD starts with a ``search keyword{''}
discovery process for finding relevant keywords for the linked data
domain and specifically SPARQL endpoints. Then, the collected search
keywords are utilized to find linked data sources via popular search
engines (Google, Bing, Yahoo, Yandex). By using this method, most of the
currently listed SPARQL endpoints in existing endpoint repositories, as
well as a significant number of new SPARQL endpoints, have been
discovered. We analyze our findings in comparison to Datahub collection
in detail.... Show more Show less