Product Description
Set Matching Measures for External Cluster Validity
Abstract— Comparing two clustering results of a data set is a challenging task in cluster analysis. Many external validity measures have been proposed in the literature. A good measure should be invariant to the changes of data size, cluster size and number of clusters. We give an overview of existing set matching indexes and analyze their properties. Set matching measures are based on matching clusters from two clusterings. We analyze the measures in three parts: 1. cluster similarity 2. matching 3. overall measurement. Correction for chance is also investigated and we prove that normalized mutual information and variation of information are intrinsically corrected. We propose a new scheme of experiments based on synthetic data for evaluation of an external validity index. Accordingly, popular external indexes are evaluated and compared when applied to clusterings of different data size, cluster size and number of clusters. The experiments show that set matching measures are clearly better than the other tested. Based on the analytical comparisons, we introduce a new index called Pair Sets Index (PSI). A huge number of clustering techniques have been developed in different application fields. Different algorithms or even one algorithm with different parameters can result in different partitions for the same data set. < final year projects >
Including Packages
Our Specialization
Support Service
Statistical Report
satisfied customers
3,589Freelance projects
983sales on Site
11,021developers
175+