Effect of mixing probabilities on the bias of cross-validation under separate sampling

Amin Zollanvari, Ulisses M. Braga-Neto, Edward R. Dougherty

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Cross-validation is commonly used to estimate the overall error rate of a designed classifier in a small-sample expression study. The true error of the classifier is a function of the prior probabilities of the classes. With random sampling these can be estimated consistently in terms of the class sample sizes, but when sampling is separate, meaning these sample sizes are determined prior to sampling, there are no reasonable estimates from the data and the prior probabilities must be 'estimated' outside the experiment. We have conducted a set of simulations to study the bias of cross-validation as a function of these 'estimates'. The results show that a poor choice for estimating these probabilities can significantly increase the bias of cross-validation as an estimator of the true error.

Original languageEnglish
Title of host publicationProceedings - IEEE International Workshop on Genomic Signal Processing and Statistics
Pages98-99
Number of pages2
DOIs
Publication statusPublished - 2013
Externally publishedYes
Event2013 IEEE International Workshop on Genomic Signal Processing and Statistics, GENSIPS 2013 - Houston, TX, United States
Duration: Nov 17 2013Nov 19 2013

Other

Other2013 IEEE International Workshop on Genomic Signal Processing and Statistics, GENSIPS 2013
CountryUnited States
CityHouston, TX
Period11/17/1311/19/13

Fingerprint

Sampling
Sample Size
Classifiers
Experiments

ASJC Scopus subject areas

  • Biochemistry, Genetics and Molecular Biology (miscellaneous)
  • Computational Theory and Mathematics
  • Signal Processing
  • Biomedical Engineering

Cite this

Zollanvari, A., Braga-Neto, U. M., & Dougherty, E. R. (2013). Effect of mixing probabilities on the bias of cross-validation under separate sampling. In Proceedings - IEEE International Workshop on Genomic Signal Processing and Statistics (pp. 98-99). [6735947] https://doi.org/10.1109/GENSIPS.2013.6735947

Effect of mixing probabilities on the bias of cross-validation under separate sampling. / Zollanvari, Amin; Braga-Neto, Ulisses M.; Dougherty, Edward R.

Proceedings - IEEE International Workshop on Genomic Signal Processing and Statistics. 2013. p. 98-99 6735947.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Zollanvari, A, Braga-Neto, UM & Dougherty, ER 2013, Effect of mixing probabilities on the bias of cross-validation under separate sampling. in Proceedings - IEEE International Workshop on Genomic Signal Processing and Statistics., 6735947, pp. 98-99, 2013 IEEE International Workshop on Genomic Signal Processing and Statistics, GENSIPS 2013, Houston, TX, United States, 11/17/13. https://doi.org/10.1109/GENSIPS.2013.6735947
Zollanvari A, Braga-Neto UM, Dougherty ER. Effect of mixing probabilities on the bias of cross-validation under separate sampling. In Proceedings - IEEE International Workshop on Genomic Signal Processing and Statistics. 2013. p. 98-99. 6735947 https://doi.org/10.1109/GENSIPS.2013.6735947
Zollanvari, Amin ; Braga-Neto, Ulisses M. ; Dougherty, Edward R. / Effect of mixing probabilities on the bias of cross-validation under separate sampling. Proceedings - IEEE International Workshop on Genomic Signal Processing and Statistics. 2013. pp. 98-99
@inproceedings{868e06ec5ac8473c9008ba36f850d8ee,
title = "Effect of mixing probabilities on the bias of cross-validation under separate sampling",
abstract = "Cross-validation is commonly used to estimate the overall error rate of a designed classifier in a small-sample expression study. The true error of the classifier is a function of the prior probabilities of the classes. With random sampling these can be estimated consistently in terms of the class sample sizes, but when sampling is separate, meaning these sample sizes are determined prior to sampling, there are no reasonable estimates from the data and the prior probabilities must be 'estimated' outside the experiment. We have conducted a set of simulations to study the bias of cross-validation as a function of these 'estimates'. The results show that a poor choice for estimating these probabilities can significantly increase the bias of cross-validation as an estimator of the true error.",
author = "Amin Zollanvari and Braga-Neto, {Ulisses M.} and Dougherty, {Edward R.}",
year = "2013",
doi = "10.1109/GENSIPS.2013.6735947",
language = "English",
isbn = "9781479934621",
pages = "98--99",
booktitle = "Proceedings - IEEE International Workshop on Genomic Signal Processing and Statistics",

}

TY - GEN

T1 - Effect of mixing probabilities on the bias of cross-validation under separate sampling

AU - Zollanvari, Amin

AU - Braga-Neto, Ulisses M.

AU - Dougherty, Edward R.

PY - 2013

Y1 - 2013

N2 - Cross-validation is commonly used to estimate the overall error rate of a designed classifier in a small-sample expression study. The true error of the classifier is a function of the prior probabilities of the classes. With random sampling these can be estimated consistently in terms of the class sample sizes, but when sampling is separate, meaning these sample sizes are determined prior to sampling, there are no reasonable estimates from the data and the prior probabilities must be 'estimated' outside the experiment. We have conducted a set of simulations to study the bias of cross-validation as a function of these 'estimates'. The results show that a poor choice for estimating these probabilities can significantly increase the bias of cross-validation as an estimator of the true error.

AB - Cross-validation is commonly used to estimate the overall error rate of a designed classifier in a small-sample expression study. The true error of the classifier is a function of the prior probabilities of the classes. With random sampling these can be estimated consistently in terms of the class sample sizes, but when sampling is separate, meaning these sample sizes are determined prior to sampling, there are no reasonable estimates from the data and the prior probabilities must be 'estimated' outside the experiment. We have conducted a set of simulations to study the bias of cross-validation as a function of these 'estimates'. The results show that a poor choice for estimating these probabilities can significantly increase the bias of cross-validation as an estimator of the true error.

UR - http://www.scopus.com/inward/record.url?scp=84897694038&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84897694038&partnerID=8YFLogxK

U2 - 10.1109/GENSIPS.2013.6735947

DO - 10.1109/GENSIPS.2013.6735947

M3 - Conference contribution

SN - 9781479934621

SP - 98

EP - 99

BT - Proceedings - IEEE International Workshop on Genomic Signal Processing and Statistics

ER -