Effect of mixing probabilities on the bias of cross-validation under separate sampling

Amin Zollanvari, Ulisses M. Braga-Neto, Edward R. Dougherty

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Cross-validation is commonly used to estimate the overall error rate of a designed classifier in a small-sample expression study. The true error of the classifier is a function of the prior probabilities of the classes. With random sampling these can be estimated consistently in terms of the class sample sizes, but when sampling is separate, meaning these sample sizes are determined prior to sampling, there are no reasonable estimates from the data and the prior probabilities must be 'estimated' outside the experiment. We have conducted a set of simulations to study the bias of cross-validation as a function of these 'estimates'. The results show that a poor choice for estimating these probabilities can significantly increase the bias of cross-validation as an estimator of the true error.

Original languageEnglish
Title of host publication2013 IEEE International Workshop on Genomic Signal Processing and Statistics, GENSIPS 2013 - Proceedings
Pages98-99
Number of pages2
DOIs
Publication statusPublished - Dec 1 2013
Externally publishedYes
Event2013 IEEE International Workshop on Genomic Signal Processing and Statistics, GENSIPS 2013 - Houston, TX, United States
Duration: Nov 17 2013Nov 19 2013

Publication series

NameProceedings - IEEE International Workshop on Genomic Signal Processing and Statistics
ISSN (Print)2150-3001
ISSN (Electronic)2150-301X

Other

Other2013 IEEE International Workshop on Genomic Signal Processing and Statistics, GENSIPS 2013
CountryUnited States
CityHouston, TX
Period11/17/1311/19/13

ASJC Scopus subject areas

  • Biochemistry, Genetics and Molecular Biology (miscellaneous)
  • Computational Theory and Mathematics
  • Signal Processing
  • Biomedical Engineering

Fingerprint Dive into the research topics of 'Effect of mixing probabilities on the bias of cross-validation under separate sampling'. Together they form a unique fingerprint.

Cite this