Prediction of supertype-specific HLA class I binding peptides using support vector machines

Guang Lan Zhang, Ivana Bozic, Chee Keong Kwoh, J. Thomas August, Vladimir Brusic

Research output: Contribution to journalArticle

32 Citations (Scopus)

Abstract

Experimental approaches for identifying T-cell epitopes are time-consuming, costly and not applicable to the large scale screening. Computer modeling methods can help to minimize the number of experiments required, enable a systematic scanning for candidate major histocompatibility complex (MHC) binding peptides and thus speed up vaccine development. We developed a prediction system based on a novel data representation of peptide/MHC interaction and support vector machines (SVM) for prediction of peptides that promiscuously bind to multiple Human Leukocyte Antigen (HLA, human MHC) alleles belonging to a HLA supertype. Ten-fold cross-validation results showed that the overall performance of SVM models is improved in comparison to our previously published methods based on hidden Markov models (HMM) and artificial neural networks (ANN), also confirmed by blind testing. At specificity 0.90, sensitivity values of SVM models were 0.90 and 0.92 for HLA-A2 and -A3 dataset respectively. Average area under the receiver operating curve (AROC) of SVM models in blind testing are 0.89 and 0.92 for HLA-A2 and -A3 datasets. AROC of HLA-A2 and -A3 SVM models were 0.94 and 0.95, validated using a full overlapping study of 9-mer peptides from human papillomavirus type 16 E6 and E7 proteins. In addition, a large-scale experimental dataset has been used to validate HLA-A2 and -A3 SVM models. The SVM prediction models were integrated into a web-based computational system MULTIPRED1, accessible at antigen.i2r.a-star.edu.sg/multipred1/.

Original languageEnglish
Pages (from-to)143-154
Number of pages12
JournalJournal of Immunological Methods
Volume320
Issue number1-2
DOIs
Publication statusPublished - Mar 30 2007
Externally publishedYes

Fingerprint

HLA-A3 Antigen
HLA-A2 Antigen
Peptides
Major Histocompatibility Complex
Papillomavirus E7 Proteins
T-Lymphocyte Epitopes
Neural Networks (Computer)
HLA Antigens
Support Vector Machine
Vaccines
Alleles
Antigens
Sensitivity and Specificity
Datasets

Keywords

  • Human Leukocyte Antigen supertype
  • Promiscuous binding peptide
  • Support vector machines
  • T-cell epitope

ASJC Scopus subject areas

  • Biotechnology
  • Immunology

Cite this

Prediction of supertype-specific HLA class I binding peptides using support vector machines. / Zhang, Guang Lan; Bozic, Ivana; Kwoh, Chee Keong; August, J. Thomas; Brusic, Vladimir.

In: Journal of Immunological Methods, Vol. 320, No. 1-2, 30.03.2007, p. 143-154.

Research output: Contribution to journalArticle

Zhang, Guang Lan ; Bozic, Ivana ; Kwoh, Chee Keong ; August, J. Thomas ; Brusic, Vladimir. / Prediction of supertype-specific HLA class I binding peptides using support vector machines. In: Journal of Immunological Methods. 2007 ; Vol. 320, No. 1-2. pp. 143-154.
@article{034242884cc440939f5ba61f028ed2dd,
title = "Prediction of supertype-specific HLA class I binding peptides using support vector machines",
abstract = "Experimental approaches for identifying T-cell epitopes are time-consuming, costly and not applicable to the large scale screening. Computer modeling methods can help to minimize the number of experiments required, enable a systematic scanning for candidate major histocompatibility complex (MHC) binding peptides and thus speed up vaccine development. We developed a prediction system based on a novel data representation of peptide/MHC interaction and support vector machines (SVM) for prediction of peptides that promiscuously bind to multiple Human Leukocyte Antigen (HLA, human MHC) alleles belonging to a HLA supertype. Ten-fold cross-validation results showed that the overall performance of SVM models is improved in comparison to our previously published methods based on hidden Markov models (HMM) and artificial neural networks (ANN), also confirmed by blind testing. At specificity 0.90, sensitivity values of SVM models were 0.90 and 0.92 for HLA-A2 and -A3 dataset respectively. Average area under the receiver operating curve (AROC) of SVM models in blind testing are 0.89 and 0.92 for HLA-A2 and -A3 datasets. AROC of HLA-A2 and -A3 SVM models were 0.94 and 0.95, validated using a full overlapping study of 9-mer peptides from human papillomavirus type 16 E6 and E7 proteins. In addition, a large-scale experimental dataset has been used to validate HLA-A2 and -A3 SVM models. The SVM prediction models were integrated into a web-based computational system MULTIPRED1, accessible at antigen.i2r.a-star.edu.sg/multipred1/.",
keywords = "Human Leukocyte Antigen supertype, Promiscuous binding peptide, Support vector machines, T-cell epitope",
author = "Zhang, {Guang Lan} and Ivana Bozic and Kwoh, {Chee Keong} and August, {J. Thomas} and Vladimir Brusic",
year = "2007",
month = "3",
day = "30",
doi = "10.1016/j.jim.2006.12.011",
language = "English",
volume = "320",
pages = "143--154",
journal = "Journal of Immunological Methods",
issn = "0022-1759",
publisher = "Elsevier",
number = "1-2",

}

TY - JOUR

T1 - Prediction of supertype-specific HLA class I binding peptides using support vector machines

AU - Zhang, Guang Lan

AU - Bozic, Ivana

AU - Kwoh, Chee Keong

AU - August, J. Thomas

AU - Brusic, Vladimir

PY - 2007/3/30

Y1 - 2007/3/30

N2 - Experimental approaches for identifying T-cell epitopes are time-consuming, costly and not applicable to the large scale screening. Computer modeling methods can help to minimize the number of experiments required, enable a systematic scanning for candidate major histocompatibility complex (MHC) binding peptides and thus speed up vaccine development. We developed a prediction system based on a novel data representation of peptide/MHC interaction and support vector machines (SVM) for prediction of peptides that promiscuously bind to multiple Human Leukocyte Antigen (HLA, human MHC) alleles belonging to a HLA supertype. Ten-fold cross-validation results showed that the overall performance of SVM models is improved in comparison to our previously published methods based on hidden Markov models (HMM) and artificial neural networks (ANN), also confirmed by blind testing. At specificity 0.90, sensitivity values of SVM models were 0.90 and 0.92 for HLA-A2 and -A3 dataset respectively. Average area under the receiver operating curve (AROC) of SVM models in blind testing are 0.89 and 0.92 for HLA-A2 and -A3 datasets. AROC of HLA-A2 and -A3 SVM models were 0.94 and 0.95, validated using a full overlapping study of 9-mer peptides from human papillomavirus type 16 E6 and E7 proteins. In addition, a large-scale experimental dataset has been used to validate HLA-A2 and -A3 SVM models. The SVM prediction models were integrated into a web-based computational system MULTIPRED1, accessible at antigen.i2r.a-star.edu.sg/multipred1/.

AB - Experimental approaches for identifying T-cell epitopes are time-consuming, costly and not applicable to the large scale screening. Computer modeling methods can help to minimize the number of experiments required, enable a systematic scanning for candidate major histocompatibility complex (MHC) binding peptides and thus speed up vaccine development. We developed a prediction system based on a novel data representation of peptide/MHC interaction and support vector machines (SVM) for prediction of peptides that promiscuously bind to multiple Human Leukocyte Antigen (HLA, human MHC) alleles belonging to a HLA supertype. Ten-fold cross-validation results showed that the overall performance of SVM models is improved in comparison to our previously published methods based on hidden Markov models (HMM) and artificial neural networks (ANN), also confirmed by blind testing. At specificity 0.90, sensitivity values of SVM models were 0.90 and 0.92 for HLA-A2 and -A3 dataset respectively. Average area under the receiver operating curve (AROC) of SVM models in blind testing are 0.89 and 0.92 for HLA-A2 and -A3 datasets. AROC of HLA-A2 and -A3 SVM models were 0.94 and 0.95, validated using a full overlapping study of 9-mer peptides from human papillomavirus type 16 E6 and E7 proteins. In addition, a large-scale experimental dataset has been used to validate HLA-A2 and -A3 SVM models. The SVM prediction models were integrated into a web-based computational system MULTIPRED1, accessible at antigen.i2r.a-star.edu.sg/multipred1/.

KW - Human Leukocyte Antigen supertype

KW - Promiscuous binding peptide

KW - Support vector machines

KW - T-cell epitope

UR - http://www.scopus.com/inward/record.url?scp=33847710294&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33847710294&partnerID=8YFLogxK

U2 - 10.1016/j.jim.2006.12.011

DO - 10.1016/j.jim.2006.12.011

M3 - Article

C2 - 17303158

AN - SCOPUS:33847710294

VL - 320

SP - 143

EP - 154

JO - Journal of Immunological Methods

JF - Journal of Immunological Methods

SN - 0022-1759

IS - 1-2

ER -