Product Description
Removing DUST using Multiple Alignment of Sequences
Abstract— Removing DUST using Multiple Alignment of Sequences. A large number of URLs collected by web crawlers correspond to pages with duplicate or near-duplicate contents. To crawl, store,< Final Year Project 2016 > and use such duplicated data implies a waste of resources, the building of low quality rankings and poor user experiences. To deal with this problem, several studies have been proposed to detect and remove duplicate documents without fetching their contents. To accomplish this, the proposed methods learn normalization rules to transform all duplicate URLs into the same canonical form. A challenging aspect of this strategy is deriving a set of general and precise rules. In this work, we present DUSTER, a new approach to derive quality rules that take advantage of a multi-sequence alignment strategy.
Including Packages
Our Specialization
Support Service
Statistical Report
satisfied customers
3,589Freelance projects
983sales on Site
11,021developers
175+
There are no reviews yet