TY - JOUR
T1 - Ranked selection of nearest discriminating features
AU - James, Alex Pappachen
AU - Dimitrijev, Sima
N1 - Publisher Copyright:
© 2012, James and Dimitrijev; licensee Springer.
PY - 2012/12/1
Y1 - 2012/12/1
N2 - Background: Feature selection techniques use a search-criteria driven approach for ranked feature subset selection. Often, selecting an optimal subset of ranked features using the existing methods is intractable for high dimensional gene data classification problems. Methods: In this paper, an approach based on the individual ability of the features to discriminate between different classes is proposed. The area of overlap measure between feature to feature inter-class and intra-class distance distributions is used to measure the discriminatory ability of each feature. Features with area of overlap below a specified threshold is selected to form the subset. Results: The reported method achieves higher classification accuracies with fewer numbers of features for high-dimensional micro-array gene classification problems. Experiments done on CLL-SUB-111, SMK-CAN-187, GLI-85, GLA-BRA-180 and TOX-171 databases resulted in an accuracy of 74.9±2.6, 71.2±1.7, 88.3±2.9, 68.4±5.1, and 69.6±4.4, with the corresponding selected number of features being 1, 1, 3, 37, and 89 respectively. Conclusions: The area of overlap between the inter-class and intra-class distances is demonstrated as a useful technique for selection of most discriminative ranked features. Improved classification accuracy is obtained by relevant selection of most discriminative features using the proposed method.
AB - Background: Feature selection techniques use a search-criteria driven approach for ranked feature subset selection. Often, selecting an optimal subset of ranked features using the existing methods is intractable for high dimensional gene data classification problems. Methods: In this paper, an approach based on the individual ability of the features to discriminate between different classes is proposed. The area of overlap measure between feature to feature inter-class and intra-class distance distributions is used to measure the discriminatory ability of each feature. Features with area of overlap below a specified threshold is selected to form the subset. Results: The reported method achieves higher classification accuracies with fewer numbers of features for high-dimensional micro-array gene classification problems. Experiments done on CLL-SUB-111, SMK-CAN-187, GLI-85, GLA-BRA-180 and TOX-171 databases resulted in an accuracy of 74.9±2.6, 71.2±1.7, 88.3±2.9, 68.4±5.1, and 69.6±4.4, with the corresponding selected number of features being 1, 1, 3, 37, and 89 respectively. Conclusions: The area of overlap between the inter-class and intra-class distances is demonstrated as a useful technique for selection of most discriminative ranked features. Improved classification accuracy is obtained by relevant selection of most discriminative features using the proposed method.
UR - http://www.scopus.com/inward/record.url?scp=84888006369&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84888006369&partnerID=8YFLogxK
U2 - 10.1186/2192-1962-2-12
DO - 10.1186/2192-1962-2-12
M3 - Article
AN - SCOPUS:84888006369
VL - 2
SP - 1
EP - 14
JO - Human-centric Computing and Information Sciences
JF - Human-centric Computing and Information Sciences
SN - 2192-1962
IS - 1
M1 - 12
ER -