SmartCrawler: A Two-stage Crawler for Efficiently Harvesting Deep-Web Interfaces
Abstract— SmartCrawler: A Two-stage Crawler for Efficiently Harvesting Deep-Web Interfaces. As deep web grows at a very fast pace, there has been increased interest in techniques that help efﬁciently locate deep-web interfaces. However, due to the large volume of web resources and the dynamic nature of deep web, achieving wide coverage and high efﬁciency is a challenging issue. We propose a two-stage framework, namely < Final Year Projects 2016 > SmartCrawler for efﬁcient harvesting deep web interfaces. In the ﬁrst stage, SmartCrawler performs site-based searching for center pages with the help of search engines, avoiding visiting a large number of pages. To achieve more accurate results for a focused crawl, SmartCrawler ranks websites to prioritize highly relevant ones for a given topic.