Product Description
Efficient and Effective Duplicate Detection in Hierarchical Data
Abstract— Efficient and Effective Duplicate Detection in Hierarchical Data. Although there is a long line of work on identifying duplicates in relational data, only a few solutions focus on duplicate detection in more complex hierarchical structures, like XML data. In this paper, we present a novel method for XML duplicate detection, < Final Year Projects > called XMLDup. XMLDup uses a Bayesian network to determine the probability of two XML elements being duplicates, considering not only the information within the elements, but also the way that information is structured. In addition, to improve the efficiency of the network evaluation, a novel pruning strategy, capable of significant gains over the unoptimized version of the algorithm, is presented. Through experiments, we show that our algorithm is able to achieve high precision and recall scores in several data sets. XMLDup is also able to outperform another state-of-the-art duplicate detection solution, both in terms of efficiency and of effectiveness.
Including Packages
Our Specialization
Support Service
Statistical Report
satisfied customers
3,589Freelance projects
983sales on Site
11,021developers
175+
There are no reviews yet