Classifier design given an uncertainty class of feature distributions via regularized maximum likelihood and the incorporation of biological pathway knowledge in steady-state phenotype classification

Mohammad Shahrokh Esfahani, Jason Knight, Amin Zollanvari, Byung Jun Yoon, Edward R. Dougherty

Research output: Contribution to journalArticle

8 Citations (Scopus)

Abstract

Contemporary high-throughput technologies provide measurements of very large numbers of variables but often with very small sample sizes. This paper proposes an optimization-based paradigm for utilizing prior knowledge to design better performing classifiers when sample sizes are limited. We derive approximate expressions for the first and second moments of the true error rate of the proposed classifier under the assumption of two widely used models for the uncertainty classes: ε-contamination and p-point classes. The applicability of the approximate expressions is discussed by defining the problem of finding optimal regularization parameters through minimizing the expected true error. Simulation results using the Zipf model show that the proposed paradigm yields improved classifiers that outperform traditional classifiers which use only training data. Our application of interest involves discrete gene regulatory networks possessing labeled steady-state distributions. Given prior operational knowledge of the process, our goal is to build a classifier that can accurately label future observations obtained in the steady state by utilizing both the available prior knowledge and the training data. We examine the proposed paradigm on networks containing NF-κB pathways, where it shows significant improvement in classifier performance over the classical data-only approach to classifier design. Companion website: http://gsp.tamu.edu/ Publications/supplementary/shahrokh12a.

Original languageEnglish
Pages (from-to)2783-2797
Number of pages15
JournalPattern Recognition
Volume46
Issue number10
DOIs
Publication statusPublished - Oct 2013
Externally publishedYes

Fingerprint

Maximum likelihood
Classifiers
Uncertainty
Labels
Contamination
Genes
Throughput

Keywords

  • Biological-pathway knowledge
  • Prior knowledge
  • Regularized maximum-likelihood
  • Steady-state classifier
  • Uncertainty class

ASJC Scopus subject areas

  • Software
  • Artificial Intelligence
  • Computer Vision and Pattern Recognition
  • Signal Processing

Cite this

Classifier design given an uncertainty class of feature distributions via regularized maximum likelihood and the incorporation of biological pathway knowledge in steady-state phenotype classification. / Shahrokh Esfahani, Mohammad; Knight, Jason; Zollanvari, Amin; Yoon, Byung Jun; Dougherty, Edward R.

In: Pattern Recognition, Vol. 46, No. 10, 10.2013, p. 2783-2797.

Research output: Contribution to journalArticle

@article{c928a06be3204c4392a0c532dde909f3,
title = "Classifier design given an uncertainty class of feature distributions via regularized maximum likelihood and the incorporation of biological pathway knowledge in steady-state phenotype classification",
abstract = "Contemporary high-throughput technologies provide measurements of very large numbers of variables but often with very small sample sizes. This paper proposes an optimization-based paradigm for utilizing prior knowledge to design better performing classifiers when sample sizes are limited. We derive approximate expressions for the first and second moments of the true error rate of the proposed classifier under the assumption of two widely used models for the uncertainty classes: ε-contamination and p-point classes. The applicability of the approximate expressions is discussed by defining the problem of finding optimal regularization parameters through minimizing the expected true error. Simulation results using the Zipf model show that the proposed paradigm yields improved classifiers that outperform traditional classifiers which use only training data. Our application of interest involves discrete gene regulatory networks possessing labeled steady-state distributions. Given prior operational knowledge of the process, our goal is to build a classifier that can accurately label future observations obtained in the steady state by utilizing both the available prior knowledge and the training data. We examine the proposed paradigm on networks containing NF-κB pathways, where it shows significant improvement in classifier performance over the classical data-only approach to classifier design. Companion website: http://gsp.tamu.edu/ Publications/supplementary/shahrokh12a.",
keywords = "Biological-pathway knowledge, Prior knowledge, Regularized maximum-likelihood, Steady-state classifier, Uncertainty class",
author = "{Shahrokh Esfahani}, Mohammad and Jason Knight and Amin Zollanvari and Yoon, {Byung Jun} and Dougherty, {Edward R.}",
year = "2013",
month = "10",
doi = "10.1016/j.patcog.2013.02.017",
language = "English",
volume = "46",
pages = "2783--2797",
journal = "Pattern Recognition",
issn = "0031-3203",
publisher = "Elsevier",
number = "10",

}

TY - JOUR

T1 - Classifier design given an uncertainty class of feature distributions via regularized maximum likelihood and the incorporation of biological pathway knowledge in steady-state phenotype classification

AU - Shahrokh Esfahani, Mohammad

AU - Knight, Jason

AU - Zollanvari, Amin

AU - Yoon, Byung Jun

AU - Dougherty, Edward R.

PY - 2013/10

Y1 - 2013/10

N2 - Contemporary high-throughput technologies provide measurements of very large numbers of variables but often with very small sample sizes. This paper proposes an optimization-based paradigm for utilizing prior knowledge to design better performing classifiers when sample sizes are limited. We derive approximate expressions for the first and second moments of the true error rate of the proposed classifier under the assumption of two widely used models for the uncertainty classes: ε-contamination and p-point classes. The applicability of the approximate expressions is discussed by defining the problem of finding optimal regularization parameters through minimizing the expected true error. Simulation results using the Zipf model show that the proposed paradigm yields improved classifiers that outperform traditional classifiers which use only training data. Our application of interest involves discrete gene regulatory networks possessing labeled steady-state distributions. Given prior operational knowledge of the process, our goal is to build a classifier that can accurately label future observations obtained in the steady state by utilizing both the available prior knowledge and the training data. We examine the proposed paradigm on networks containing NF-κB pathways, where it shows significant improvement in classifier performance over the classical data-only approach to classifier design. Companion website: http://gsp.tamu.edu/ Publications/supplementary/shahrokh12a.

AB - Contemporary high-throughput technologies provide measurements of very large numbers of variables but often with very small sample sizes. This paper proposes an optimization-based paradigm for utilizing prior knowledge to design better performing classifiers when sample sizes are limited. We derive approximate expressions for the first and second moments of the true error rate of the proposed classifier under the assumption of two widely used models for the uncertainty classes: ε-contamination and p-point classes. The applicability of the approximate expressions is discussed by defining the problem of finding optimal regularization parameters through minimizing the expected true error. Simulation results using the Zipf model show that the proposed paradigm yields improved classifiers that outperform traditional classifiers which use only training data. Our application of interest involves discrete gene regulatory networks possessing labeled steady-state distributions. Given prior operational knowledge of the process, our goal is to build a classifier that can accurately label future observations obtained in the steady state by utilizing both the available prior knowledge and the training data. We examine the proposed paradigm on networks containing NF-κB pathways, where it shows significant improvement in classifier performance over the classical data-only approach to classifier design. Companion website: http://gsp.tamu.edu/ Publications/supplementary/shahrokh12a.

KW - Biological-pathway knowledge

KW - Prior knowledge

KW - Regularized maximum-likelihood

KW - Steady-state classifier

KW - Uncertainty class

UR - http://www.scopus.com/inward/record.url?scp=84878011724&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84878011724&partnerID=8YFLogxK

U2 - 10.1016/j.patcog.2013.02.017

DO - 10.1016/j.patcog.2013.02.017

M3 - Article

VL - 46

SP - 2783

EP - 2797

JO - Pattern Recognition

JF - Pattern Recognition

SN - 0031-3203

IS - 10

ER -