AutoRepair: an automatic repairing approach over multi-source data

Abstract

Truth discovery methods and rule-based data repairing methods are two classic lines of approaches to improve data quality in the field of database. Truth discovery methods resolve the multi-source conflicts for the same entity by estimating the reliabilities of different source, while rule-based data repairing methods resolve the inconsistencies among different entities using integrity constraints. However, both lines of methods suffer unsatisfactory performances due to the lacking of enough evidence. In this paper, we propose AutoRepair, a novel automatic multi-source data repairing approach to enrich the evidence by taking the advantages of truth discovery and data repairing. We use functional dependency, one of the most common types of constraints, to detect the violations, and use the source reliability as evidence to discover and repair the errors among these violations. At the same time, the repaired results are used to estimate the source reliability. As the source reliability is unknown in advance, we model the process as an iterative framework to ensure better performance. Extensive experiments are conducted on both simulated and real-world datasets. The results clearly demonstrate the advantages of our approach, which outperform both recent truth discovery and rule-based data repairing methods.

Source : http://link.springer.com/10.1007/s10115-018-1284-9

Date : August 28, 2019 at 09:23AM

Tag(s) : Opendata – Livres