Background: Feature selection techniques use a search-criteria driven approach for ranked feature subset selection. Often, selecting an optimal subset of ranked features using the existing methods is intractable for high dimensional gene data classification problems. Methods: In this paper, an approach based on the individual ability of the features to discriminate between different classes is proposed. The area of overlap measure between feature to feature inter-class and intra-class distance distributions is used to measure the discriminatory ability of each feature. Features with area of overlap below a specified threshold is selected to form the subset. Results: The reported method achieves higher classification accuracies with fewer numbers of features for high-dimensional micro-array gene classification problems. Experiments done on CLL-SUB-111, SMK-CAN-187, GLI-85, GLA-BRA-180 and TOX-171 databases resulted in an accuracy of 74.9±2.6, 71.2±1.7, 88.3±2.9, 68.4±5.1, and 69.6±4.4, with the corresponding selected number of features being 1, 1, 3, 37, and 89 respectively. Conclusions: The area of overlap between the inter-class and intra-class distances is demonstrated as a useful technique for selection of most discriminative ranked features. Improved classification accuracy is obtained by relevant selection of most discriminative features using the proposed method.
ASJC Scopus subject areas
- Computer Science(all)