High-dimensional statistical learning

Roots, justifications, and potential machineries

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

High-dimensional data generally refer to data in which the number of variables is larger than the sample size. Analyzing such datasets poses great challenges for classical statistical learning because the finite-sample performance of methods developed within classical statistical learning does not live up to classical asymptotic premises in which the sample size unboundedly grows for a fixed dimensionality of observations. Much work has been done in developing mathematical–statistical techniques for analyzing high-dimensional data. Despite remarkable progress in this field, many practitioners still utilize classical methods for analyzing such datasets. This state of affairs can be attributed, in part, to a lack of knowledge and, in part, to the ready-to-use computational and statistical software packages that are well developed for classical techniques. Moreover, many scientists working in a specific field of high-dimensional statistical learning are either not aware of other existing machineries in the field or are not willing to try them out. The primary goal in this work is to bring together various machineries of high-dimensional analysis, give an overview of the important results, and pres-ent the operating conditions upon which they are grounded. When appropriate, readers are referred to relevant review articles for more information on a specific subject.

Original languageEnglish
Pages (from-to)109-121
Number of pages13
JournalCancer Informatics
Volume15
DOIs
Publication statusPublished - Apr 12 2016

Fingerprint

Learning
Sample Size
Software
Datasets

Keywords

  • Curse of dimensionality
  • Double asymptotics
  • G-analysis
  • High-dimensional analysis
  • Kolmogorov asymptotics
  • Random matrix theory
  • Ridge estimation
  • Shrinkage
  • Sparsity

ASJC Scopus subject areas

  • Cancer Research
  • Oncology

Cite this

High-dimensional statistical learning : Roots, justifications, and potential machineries. / Zollanvari, Amin.

In: Cancer Informatics, Vol. 15, 12.04.2016, p. 109-121.

Research output: Contribution to journalArticle

@article{df19780d099643d68b43c5f6a2bc9d9b,
title = "High-dimensional statistical learning: Roots, justifications, and potential machineries",
abstract = "High-dimensional data generally refer to data in which the number of variables is larger than the sample size. Analyzing such datasets poses great challenges for classical statistical learning because the finite-sample performance of methods developed within classical statistical learning does not live up to classical asymptotic premises in which the sample size unboundedly grows for a fixed dimensionality of observations. Much work has been done in developing mathematical–statistical techniques for analyzing high-dimensional data. Despite remarkable progress in this field, many practitioners still utilize classical methods for analyzing such datasets. This state of affairs can be attributed, in part, to a lack of knowledge and, in part, to the ready-to-use computational and statistical software packages that are well developed for classical techniques. Moreover, many scientists working in a specific field of high-dimensional statistical learning are either not aware of other existing machineries in the field or are not willing to try them out. The primary goal in this work is to bring together various machineries of high-dimensional analysis, give an overview of the important results, and pres-ent the operating conditions upon which they are grounded. When appropriate, readers are referred to relevant review articles for more information on a specific subject.",
keywords = "Curse of dimensionality, Double asymptotics, G-analysis, High-dimensional analysis, Kolmogorov asymptotics, Random matrix theory, Ridge estimation, Shrinkage, Sparsity",
author = "Amin Zollanvari",
year = "2016",
month = "4",
day = "12",
doi = "10.4137/CIN.S30804",
language = "English",
volume = "15",
pages = "109--121",
journal = "Cancer Informatics",
issn = "1176-9351",
publisher = "Libertas Academica Ltd.",

}

TY - JOUR

T1 - High-dimensional statistical learning

T2 - Roots, justifications, and potential machineries

AU - Zollanvari, Amin

PY - 2016/4/12

Y1 - 2016/4/12

N2 - High-dimensional data generally refer to data in which the number of variables is larger than the sample size. Analyzing such datasets poses great challenges for classical statistical learning because the finite-sample performance of methods developed within classical statistical learning does not live up to classical asymptotic premises in which the sample size unboundedly grows for a fixed dimensionality of observations. Much work has been done in developing mathematical–statistical techniques for analyzing high-dimensional data. Despite remarkable progress in this field, many practitioners still utilize classical methods for analyzing such datasets. This state of affairs can be attributed, in part, to a lack of knowledge and, in part, to the ready-to-use computational and statistical software packages that are well developed for classical techniques. Moreover, many scientists working in a specific field of high-dimensional statistical learning are either not aware of other existing machineries in the field or are not willing to try them out. The primary goal in this work is to bring together various machineries of high-dimensional analysis, give an overview of the important results, and pres-ent the operating conditions upon which they are grounded. When appropriate, readers are referred to relevant review articles for more information on a specific subject.

AB - High-dimensional data generally refer to data in which the number of variables is larger than the sample size. Analyzing such datasets poses great challenges for classical statistical learning because the finite-sample performance of methods developed within classical statistical learning does not live up to classical asymptotic premises in which the sample size unboundedly grows for a fixed dimensionality of observations. Much work has been done in developing mathematical–statistical techniques for analyzing high-dimensional data. Despite remarkable progress in this field, many practitioners still utilize classical methods for analyzing such datasets. This state of affairs can be attributed, in part, to a lack of knowledge and, in part, to the ready-to-use computational and statistical software packages that are well developed for classical techniques. Moreover, many scientists working in a specific field of high-dimensional statistical learning are either not aware of other existing machineries in the field or are not willing to try them out. The primary goal in this work is to bring together various machineries of high-dimensional analysis, give an overview of the important results, and pres-ent the operating conditions upon which they are grounded. When appropriate, readers are referred to relevant review articles for more information on a specific subject.

KW - Curse of dimensionality

KW - Double asymptotics

KW - G-analysis

KW - High-dimensional analysis

KW - Kolmogorov asymptotics

KW - Random matrix theory

KW - Ridge estimation

KW - Shrinkage

KW - Sparsity

UR - http://www.scopus.com/inward/record.url?scp=84963525734&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84963525734&partnerID=8YFLogxK

U2 - 10.4137/CIN.S30804

DO - 10.4137/CIN.S30804

M3 - Article

VL - 15

SP - 109

EP - 121

JO - Cancer Informatics

JF - Cancer Informatics

SN - 1176-9351

ER -