Communication Dans Un Congrès Année : 2012

Keys and Pseudo-Keys Detection for Web Datasets Cleansing and Interlinking

Résumé

This paper introduces a method for analyzing web datasets based on key dependencies. The classical notion of a key in relational databases is adapted to RDF datasets. In order to better deal with web data of variable quality, the definition of a pseudo-key is presented. An RDF vocabulary for representing keys is also provided. An algorithm to discover keys and pseudo-keys is described. Experimental results show that even for a big dataset such as DBpedia, the runtime of the algorithm is still reasonable. Two applications are further discussed: (i) detection of errors in RDF datasets, and (ii) datasets interlinking.

Domaines

Fichier principal
Vignette du fichier
atencia2012b.pdf (210.86 Ko) Télécharger le fichier
Origine Fichiers produits par l'(les) auteur(s)
Licence
Loading...

Dates et versions

hal-00768412 , version 1 (21-12-2012)

Licence

Identifiants

Citer

Manuel Atencia, Jérôme David, François Scharffe. Keys and Pseudo-Keys Detection for Web Datasets Cleansing and Interlinking. EKAW: Knowledge Engineering and Knowledge Management, Oct 2012, Galway, Ireland. pp.144-153, ⟨10.1007/978-3-642-33876-2_14⟩. ⟨hal-00768412⟩
490 Consultations
758 Téléchargements

Altmetric

Partager

  • More