Product Description
SVM-based Web Content Mining with Leaf Classification Unit from DOM-tree
Abstract-In order to analyze a news article dataset, we first extract important information such as title, date, and paragraph of the body. At the same time, we remove unnecessary information such as image, caption, footer, advertisement, navigation and recommended news. The problem is that the formats of news articles are changing according to time and also they vary according to news source and even section of it. So, it is important for a model to generalize when predicting unseen formats of news articles. We confirmed that a machine learning based model is better to predict new data than a rule-based model by some experiments. Also, we suggest that noise information in the body possibly can be removed because we define a classification unit as a leaf node itself. On the other hand, general machine learning based models cannot remove noise information. Since they consider the classification unit as an intermediate node which consists of the set of leaf nodes, they cannot classify a leaf node itself.< final year projects >
Including Packages
Our Specialization
Support Service
Statistical Report
satisfied customers
3,589Freelance projects
983sales on Site
11,021developers
175+