An Efficient Method to Estimate the Optimum Regularization Parameter in RLDA

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

Motivation: The biomarker discovery process in high-throughput genomic profiles has presented the statistical learning community with a challenging problem, namely learning when the number of variables is comparable or exceeding the sample size. In these settings, many classical techniques including linear discriminant analysis (LDA) falter. Poor performance of LDA is attributed to the ill-conditioned nature of sample covariance matrix when the dimension and sample size are comparable. To alleviate this problem regularized LDA (RLDA) has been classically proposed in which the sample covariance matrix is replaced by its ridge estimate. However, the performance of RLDA depends heavily on the regularization parameter used in the ridge estimate of sample covariance matrix.

Results: We propose a range-search technique for efficient estimation of the optimum regularization parameter. Using an extensive set of simulations based on synthetic and gene expression microarray data, we demonstrate the robustness of the proposed technique to Gaussianity, an assumption used in developing the core estimator. We compare the performance of the technique in terms of accuracy and efficiency to classical techniques for estimating the regularization parameter. In terms of accuracy, the results indicate that the proposed method vastly improves on similar techniques that use classical plug-in estimator. In that respect, it is better or comparable to cross-validation based search strategies while, depending on the sample size and dimensionality, being tens to hundreds of times faster to compute.

Contact: amin.zollanvari@nu.edu.kz
Original languageEnglish
Number of pages7
JournalBioinformatics
DOIs
Publication statusPublished - 2016

Fingerprint

Regularization Parameter
Discriminant Analysis
Discriminant analysis
Covariance matrix
Sample Size
Sample Covariance Matrix
Learning
Estimate
Synthetic Genes
Biomarkers
Ridge
Microarrays
Gene expression
Plug-in Estimator
Throughput
Efficiency
Gene Expression
Statistical Learning
Efficient Estimation
Search Strategy

ASJC Scopus subject areas

  • Computer Science(all)
  • Medicine (miscellaneous)

Cite this

@article{3dcd5bb3c3e84b1caf829e904213a817,
title = "An Efficient Method to Estimate the Optimum Regularization Parameter in RLDA",
abstract = "Motivation: The biomarker discovery process in high-throughput genomic profiles has presented the statistical learning community with a challenging problem, namely learning when the number of variables is comparable or exceeding the sample size. In these settings, many classical techniques including linear discriminant analysis (LDA) falter. Poor performance of LDA is attributed to the ill-conditioned nature of sample covariance matrix when the dimension and sample size are comparable. To alleviate this problem regularized LDA (RLDA) has been classically proposed in which the sample covariance matrix is replaced by its ridge estimate. However, the performance of RLDA depends heavily on the regularization parameter used in the ridge estimate of sample covariance matrix.Results: We propose a range-search technique for efficient estimation of the optimum regularization parameter. Using an extensive set of simulations based on synthetic and gene expression microarray data, we demonstrate the robustness of the proposed technique to Gaussianity, an assumption used in developing the core estimator. We compare the performance of the technique in terms of accuracy and efficiency to classical techniques for estimating the regularization parameter. In terms of accuracy, the results indicate that the proposed method vastly improves on similar techniques that use classical plug-in estimator. In that respect, it is better or comparable to cross-validation based search strategies while, depending on the sample size and dimensionality, being tens to hundreds of times faster to compute.Contact: amin.zollanvari@nu.edu.kz",
author = "Daniyar Bakir and {James Pappachen}, Alex and Amin Zollanvari",
year = "2016",
doi = "10.1093/bioinformatics/btw506",
language = "English",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",

}

TY - JOUR

T1 - An Efficient Method to Estimate the Optimum Regularization Parameter in RLDA

AU - Bakir, Daniyar

AU - James Pappachen, Alex

AU - Zollanvari, Amin

PY - 2016

Y1 - 2016

N2 - Motivation: The biomarker discovery process in high-throughput genomic profiles has presented the statistical learning community with a challenging problem, namely learning when the number of variables is comparable or exceeding the sample size. In these settings, many classical techniques including linear discriminant analysis (LDA) falter. Poor performance of LDA is attributed to the ill-conditioned nature of sample covariance matrix when the dimension and sample size are comparable. To alleviate this problem regularized LDA (RLDA) has been classically proposed in which the sample covariance matrix is replaced by its ridge estimate. However, the performance of RLDA depends heavily on the regularization parameter used in the ridge estimate of sample covariance matrix.Results: We propose a range-search technique for efficient estimation of the optimum regularization parameter. Using an extensive set of simulations based on synthetic and gene expression microarray data, we demonstrate the robustness of the proposed technique to Gaussianity, an assumption used in developing the core estimator. We compare the performance of the technique in terms of accuracy and efficiency to classical techniques for estimating the regularization parameter. In terms of accuracy, the results indicate that the proposed method vastly improves on similar techniques that use classical plug-in estimator. In that respect, it is better or comparable to cross-validation based search strategies while, depending on the sample size and dimensionality, being tens to hundreds of times faster to compute.Contact: amin.zollanvari@nu.edu.kz

AB - Motivation: The biomarker discovery process in high-throughput genomic profiles has presented the statistical learning community with a challenging problem, namely learning when the number of variables is comparable or exceeding the sample size. In these settings, many classical techniques including linear discriminant analysis (LDA) falter. Poor performance of LDA is attributed to the ill-conditioned nature of sample covariance matrix when the dimension and sample size are comparable. To alleviate this problem regularized LDA (RLDA) has been classically proposed in which the sample covariance matrix is replaced by its ridge estimate. However, the performance of RLDA depends heavily on the regularization parameter used in the ridge estimate of sample covariance matrix.Results: We propose a range-search technique for efficient estimation of the optimum regularization parameter. Using an extensive set of simulations based on synthetic and gene expression microarray data, we demonstrate the robustness of the proposed technique to Gaussianity, an assumption used in developing the core estimator. We compare the performance of the technique in terms of accuracy and efficiency to classical techniques for estimating the regularization parameter. In terms of accuracy, the results indicate that the proposed method vastly improves on similar techniques that use classical plug-in estimator. In that respect, it is better or comparable to cross-validation based search strategies while, depending on the sample size and dimensionality, being tens to hundreds of times faster to compute.Contact: amin.zollanvari@nu.edu.kz

U2 - 10.1093/bioinformatics/btw506

DO - 10.1093/bioinformatics/btw506

M3 - Article

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

ER -