Abstract—Analysis of feature selection stability on high dimension and small sample data. Gene selection is a crucial step when building a classifier from microarray or metagenomic data. As the number of observations is small, the gene selection tends to be unstable. It is common that two gene subsets, obtained from different datasets but dealing with the same classification problem, do not overlap significantly. Although it is a crucial problem, few works have been done on the selection stability. In this paper, we first present some stability quantification methods, then we study the variations of those measures with various parameters (dimensionality, sample size, feature distribution, selection threshold) on both artificial and real data, < Final Year Projects > as well as the resulting classification performance. Feature selection was performed with t-test and classification with linear discriminant analysis. We point out a strong empiric correlation between the dimensionality/sample size ratio and selection instability.
sales on Site11,021