Is the reduction of dimensionality to a small number of features always necessary in constructing predictive models for analysis of complex diseases or behaviours?

Amin Zollanvari, Nancy L. Saccone, Laura J. Bierut, Marco F. Ramoni, Gil Alterovitz

Research output: Contribution to journalArticle

Abstract

Gene expression and genome wide association data have provided researchers the opportunity to study many complex traits and diseases. When designing prognostic and predictive models capable of phenotypic classification in this area, significant reduction of dimensionality through stringent filtering and/or feature selection is often deemed imperative. Here, this work challenges this presumption through both theoretical and empirical analysis. This work demonstrates that by a proper compromise between structure of the selected model and the number of features, one is able to achieve better performance even in large dimensionality. The inclusion of many genes/variants in the classification rules can help shed new light on the analysis of complex traitstraits that are typically determined by many causal variants with small effect size.

Original languageEnglish
Pages (from-to)3573-3576
Number of pages4
JournalConference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Conference
Volume2011
Publication statusPublished - 2011
Externally publishedYes

Fingerprint

Genes
Gene expression
Feature extraction
Research Personnel
Association reactions
Genome
Gene Expression

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition
  • Signal Processing
  • Biomedical Engineering
  • Health Informatics

Cite this

@article{87796c84a98a47518725b1f72a591bba,
title = "Is the reduction of dimensionality to a small number of features always necessary in constructing predictive models for analysis of complex diseases or behaviours?",
abstract = "Gene expression and genome wide association data have provided researchers the opportunity to study many complex traits and diseases. When designing prognostic and predictive models capable of phenotypic classification in this area, significant reduction of dimensionality through stringent filtering and/or feature selection is often deemed imperative. Here, this work challenges this presumption through both theoretical and empirical analysis. This work demonstrates that by a proper compromise between structure of the selected model and the number of features, one is able to achieve better performance even in large dimensionality. The inclusion of many genes/variants in the classification rules can help shed new light on the analysis of complex traitstraits that are typically determined by many causal variants with small effect size.",
author = "Amin Zollanvari and Saccone, {Nancy L.} and Bierut, {Laura J.} and Ramoni, {Marco F.} and Gil Alterovitz",
year = "2011",
language = "English",
volume = "2011",
pages = "3573--3576",
journal = "Conference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Conference",
issn = "1557-170X",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Is the reduction of dimensionality to a small number of features always necessary in constructing predictive models for analysis of complex diseases or behaviours?

AU - Zollanvari, Amin

AU - Saccone, Nancy L.

AU - Bierut, Laura J.

AU - Ramoni, Marco F.

AU - Alterovitz, Gil

PY - 2011

Y1 - 2011

N2 - Gene expression and genome wide association data have provided researchers the opportunity to study many complex traits and diseases. When designing prognostic and predictive models capable of phenotypic classification in this area, significant reduction of dimensionality through stringent filtering and/or feature selection is often deemed imperative. Here, this work challenges this presumption through both theoretical and empirical analysis. This work demonstrates that by a proper compromise between structure of the selected model and the number of features, one is able to achieve better performance even in large dimensionality. The inclusion of many genes/variants in the classification rules can help shed new light on the analysis of complex traitstraits that are typically determined by many causal variants with small effect size.

AB - Gene expression and genome wide association data have provided researchers the opportunity to study many complex traits and diseases. When designing prognostic and predictive models capable of phenotypic classification in this area, significant reduction of dimensionality through stringent filtering and/or feature selection is often deemed imperative. Here, this work challenges this presumption through both theoretical and empirical analysis. This work demonstrates that by a proper compromise between structure of the selected model and the number of features, one is able to achieve better performance even in large dimensionality. The inclusion of many genes/variants in the classification rules can help shed new light on the analysis of complex traitstraits that are typically determined by many causal variants with small effect size.

UR - http://www.scopus.com/inward/record.url?scp=84862619024&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84862619024&partnerID=8YFLogxK

M3 - Article

VL - 2011

SP - 3573

EP - 3576

JO - Conference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Conference

JF - Conference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Conference

SN - 1557-170X

ER -