SpEnD: Linked Data SPARQL Endpoints Discovery Using Search Engines

  • Yazar/lar YUMUŞAK, Semih
    DOĞDU, Erdoğan
    KODAZ, Halife
    KAMILARIS, Andreas
    VANDENBUSSCHE, Pierre-Yves
  • Yayın Türü Makale
  • Yayın Tarihi 2017
  • Tek Biçim Adres https://hdl.handle.net/20.500.12498/1042

Linked data endpoints are online query gateways to semantically annotated linked data sources. In order to query these data sources, SPARQL query language is used as a standard. Although a linked data endpoint (i.e. SPARQL endpoint) is a basic Web service, it provides a platform for federated online querying and data linking methods. For linked data consumers, SPARQL endpoint availability and discovery are crucial for live querying and semantic information retrieval. Current studies show that availability of linked datasets is very low, while the locations of linked data endpoints change frequently. There are linked data respsitories that collect and list the available linked data endpoints or resources. It is observed that around half of the endpoints listed in existing repositories are not accessible (temporarily or permanently offline). These endpoint URLs are shared through repository websites, such as Datahub.io, however, they are weakly maintained and revised only by their publishers. In this study, a novel metacrawling method is proposed for discovering and monitoring linked data sources on the Web. We implemented the method in a prototype system, named SPARQL Endpoints Discovery (SpEnD). SpEnD starts with a “search keyword” discovery process for finding relevant keywords for the linked data domain and specifically SPARQL endpoints. Then, the collected search keywords are utilized to find linked data sources via popular search engines (Google, Bing, Yahoo, Yandex). By using this method, most of the currently listed SPARQL endpoints in existing endpoint repositories, as well as a significant number of new SPARQL endpoints, have been discovered. We analyze our findings in comparison to Datahub collection in detail.

Erişime Açık
Görüntülenme
3
22.03.2024 tarihinden bu yana
İndirme
1
22.03.2024 tarihinden bu yana
Son Erişim Tarihi
17 Nisan 2024 13:58
Google Kontrol
Tıklayınız
Tam Metin
Tam Metin İndirmek için tıklayın Ön izleme
Detaylı Görünüm
Eser Adı
(dc.title)
SpEnD: Linked Data SPARQL Endpoints Discovery Using Search Engines
Yayın Türü
(dc.type)
Makale
Yazar/lar
(dc.contributor.author)
YUMUŞAK, Semih
Yazar/lar
(dc.contributor.author)
DOĞDU, Erdoğan
Yazar/lar
(dc.contributor.author)
KODAZ, Halife
Yazar/lar
(dc.contributor.author)
KAMILARIS, Andreas
Yazar/lar
(dc.contributor.author)
VANDENBUSSCHE, Pierre-Yves
Atıf Dizini
(dc.source.database)
Wos
Atıf Dizini
(dc.source.database)
Scopus
Yayın Tarihi
(dc.date.issued)
2017
Kayıt Giriş Tarihi
(dc.date.accessioned)
2019-07-10T08:17:41Z
Açık Erişim tarihi
(dc.date.available)
2019-07-10T08:17:41Z
Özet
(dc.description.abstract)
Linked data endpoints are online query gateways to semantically annotated linked data sources. In order to query these data sources, SPARQL query language is used as a standard. Although a linked data endpoint (i.e. SPARQL endpoint) is a basic Web service, it provides a platform for federated online querying and data linking methods. For linked data consumers, SPARQL endpoint availability and discovery are crucial for live querying and semantic information retrieval. Current studies show that availability of linked datasets is very low, while the locations of linked data endpoints change frequently. There are linked data respsitories that collect and list the available linked data endpoints or resources. It is observed that around half of the endpoints listed in existing repositories are not accessible (temporarily or permanently offline). These endpoint URLs are shared through repository websites, such as Datahub.io, however, they are weakly maintained and revised only by their publishers. In this study, a novel metacrawling method is proposed for discovering and monitoring linked data sources on the Web. We implemented the method in a prototype system, named SPARQL Endpoints Discovery (SpEnD). SpEnD starts with a “search keyword” discovery process for finding relevant keywords for the linked data domain and specifically SPARQL endpoints. Then, the collected search keywords are utilized to find linked data sources via popular search engines (Google, Bing, Yahoo, Yandex). By using this method, most of the currently listed SPARQL endpoints in existing endpoint repositories, as well as a significant number of new SPARQL endpoints, have been discovered. We analyze our findings in comparison to Datahub collection in detail.
Yayın Dili
(dc.language.iso)
en
Tek Biçim Adres
(dc.identifier.uri)
https://hdl.handle.net/20.500.12498/1042
Analizler
Yayın Görüntülenme
Yayın Görüntülenme
Erişilen ülkeler
Erişilen şehirler
6698 sayılı Kişisel Verilerin Korunması Kanunu kapsamında yükümlülüklerimiz ve cerez politikamız hakkında bilgi sahibi olmak için alttaki bağlantıyı kullanabilirsiniz.
Tamam

creativecommons
Bu site altında yer alan tüm kaynaklar Creative Commons Alıntı-GayriTicari-Türetilemez 4.0 Uluslararası Lisansı ile lisanslanmıştır.
Platforms