Pathway analysis and transcriptomics improve protein identification by shotgun proteomics from samples comprising small number of cells - a benchmarking study

Jing Sun, Guang Lan Zhang, Siyang Li, Alexander R. Ivanov, David Fenyo, Frederique Lisacek, Shashi K. Murthy, Barry L. Karger, Vladimir Brusic

Research output: Contribution to journalArticle

8 Citations (Scopus)

Abstract

Background: Proteomics research is enabled with the high-throughput technologies, but our ability to identify expressed proteome is limited in small samples. The coverage and consistency of proteome expression are critical problems in proteomics. Here, we propose pathway analysis and combination of microproteomics and transcriptomics analyses to improve mass-spectrometry protein identification from small size samples. Results: Multiple proteomics runs using MCF-7 cell line detected 4,957 expressed proteins. About 80% of expressed proteins were present in MCF-7 transcripts data; highly expressed transcripts are more likely to have expressed proteins. Approximately 1,000 proteins were detected in each run of the small sample proteomics. These proteins were mapped to gene symbols and compared with gene sets representing canonical pathways, more than 4,000 genes were extracted from the enriched gene sets. The identified canonical pathways were largely overlapping between individual runs. Of identified pathways 182 were shared between three individual small sample runs. Conclusions: Current technologies enable us to directly 10% of expressed proteomes from small sample comprising as few as 50 cells. We used knowledge-based approaches to elucidate the missing proteome that can be verified by targeted proteomics. This knowledge-based approach includes pathway analysis and combination of gene expression and protein expression data for target prioritization. Genes present in both the enriched gene sets (canonical pathways collection) and in small sample proteomics data correspond to approximately 50% of expressed proteomes in larger sample proteomics data. In addition, 90% of targets from canonical pathways were estimated to be expressed. The comparison of proteomics and transcriptomics data, suggests that highly expressed transcripts have high probability of protein expression. However, approximately 10% of expressed proteins could not be matched with the expressed transcripts.

Original languageEnglish
Article numberS1
JournalBMC Genomics
Volume15
DOIs
Publication statusPublished - Dec 8 2014

Fingerprint

Benchmarking
Firearms
Proteomics
Cell Count
Proteome
Proteins
Genes
Technology
MCF-7 Cells
Sample Size
Mass Spectrometry
Gene Expression
Cell Line

ASJC Scopus subject areas

  • Biotechnology
  • Genetics

Cite this

Pathway analysis and transcriptomics improve protein identification by shotgun proteomics from samples comprising small number of cells - a benchmarking study. / Sun, Jing; Zhang, Guang Lan; Li, Siyang; Ivanov, Alexander R.; Fenyo, David; Lisacek, Frederique; Murthy, Shashi K.; Karger, Barry L.; Brusic, Vladimir.

In: BMC Genomics, Vol. 15, S1, 08.12.2014.

Research output: Contribution to journalArticle

Sun, Jing ; Zhang, Guang Lan ; Li, Siyang ; Ivanov, Alexander R. ; Fenyo, David ; Lisacek, Frederique ; Murthy, Shashi K. ; Karger, Barry L. ; Brusic, Vladimir. / Pathway analysis and transcriptomics improve protein identification by shotgun proteomics from samples comprising small number of cells - a benchmarking study. In: BMC Genomics. 2014 ; Vol. 15.
@article{2085579db0654fb2ba2e2c009f733130,
title = "Pathway analysis and transcriptomics improve protein identification by shotgun proteomics from samples comprising small number of cells - a benchmarking study",
abstract = "Background: Proteomics research is enabled with the high-throughput technologies, but our ability to identify expressed proteome is limited in small samples. The coverage and consistency of proteome expression are critical problems in proteomics. Here, we propose pathway analysis and combination of microproteomics and transcriptomics analyses to improve mass-spectrometry protein identification from small size samples. Results: Multiple proteomics runs using MCF-7 cell line detected 4,957 expressed proteins. About 80{\%} of expressed proteins were present in MCF-7 transcripts data; highly expressed transcripts are more likely to have expressed proteins. Approximately 1,000 proteins were detected in each run of the small sample proteomics. These proteins were mapped to gene symbols and compared with gene sets representing canonical pathways, more than 4,000 genes were extracted from the enriched gene sets. The identified canonical pathways were largely overlapping between individual runs. Of identified pathways 182 were shared between three individual small sample runs. Conclusions: Current technologies enable us to directly 10{\%} of expressed proteomes from small sample comprising as few as 50 cells. We used knowledge-based approaches to elucidate the missing proteome that can be verified by targeted proteomics. This knowledge-based approach includes pathway analysis and combination of gene expression and protein expression data for target prioritization. Genes present in both the enriched gene sets (canonical pathways collection) and in small sample proteomics data correspond to approximately 50{\%} of expressed proteomes in larger sample proteomics data. In addition, 90{\%} of targets from canonical pathways were estimated to be expressed. The comparison of proteomics and transcriptomics data, suggests that highly expressed transcripts have high probability of protein expression. However, approximately 10{\%} of expressed proteins could not be matched with the expressed transcripts.",
author = "Jing Sun and Zhang, {Guang Lan} and Siyang Li and Ivanov, {Alexander R.} and David Fenyo and Frederique Lisacek and Murthy, {Shashi K.} and Karger, {Barry L.} and Vladimir Brusic",
year = "2014",
month = "12",
day = "8",
doi = "10.1186/1471-2164-15-S9-S1",
language = "English",
volume = "15",
journal = "BMC Genomics",
issn = "1471-2164",
publisher = "BioMed Central",

}

TY - JOUR

T1 - Pathway analysis and transcriptomics improve protein identification by shotgun proteomics from samples comprising small number of cells - a benchmarking study

AU - Sun, Jing

AU - Zhang, Guang Lan

AU - Li, Siyang

AU - Ivanov, Alexander R.

AU - Fenyo, David

AU - Lisacek, Frederique

AU - Murthy, Shashi K.

AU - Karger, Barry L.

AU - Brusic, Vladimir

PY - 2014/12/8

Y1 - 2014/12/8

N2 - Background: Proteomics research is enabled with the high-throughput technologies, but our ability to identify expressed proteome is limited in small samples. The coverage and consistency of proteome expression are critical problems in proteomics. Here, we propose pathway analysis and combination of microproteomics and transcriptomics analyses to improve mass-spectrometry protein identification from small size samples. Results: Multiple proteomics runs using MCF-7 cell line detected 4,957 expressed proteins. About 80% of expressed proteins were present in MCF-7 transcripts data; highly expressed transcripts are more likely to have expressed proteins. Approximately 1,000 proteins were detected in each run of the small sample proteomics. These proteins were mapped to gene symbols and compared with gene sets representing canonical pathways, more than 4,000 genes were extracted from the enriched gene sets. The identified canonical pathways were largely overlapping between individual runs. Of identified pathways 182 were shared between three individual small sample runs. Conclusions: Current technologies enable us to directly 10% of expressed proteomes from small sample comprising as few as 50 cells. We used knowledge-based approaches to elucidate the missing proteome that can be verified by targeted proteomics. This knowledge-based approach includes pathway analysis and combination of gene expression and protein expression data for target prioritization. Genes present in both the enriched gene sets (canonical pathways collection) and in small sample proteomics data correspond to approximately 50% of expressed proteomes in larger sample proteomics data. In addition, 90% of targets from canonical pathways were estimated to be expressed. The comparison of proteomics and transcriptomics data, suggests that highly expressed transcripts have high probability of protein expression. However, approximately 10% of expressed proteins could not be matched with the expressed transcripts.

AB - Background: Proteomics research is enabled with the high-throughput technologies, but our ability to identify expressed proteome is limited in small samples. The coverage and consistency of proteome expression are critical problems in proteomics. Here, we propose pathway analysis and combination of microproteomics and transcriptomics analyses to improve mass-spectrometry protein identification from small size samples. Results: Multiple proteomics runs using MCF-7 cell line detected 4,957 expressed proteins. About 80% of expressed proteins were present in MCF-7 transcripts data; highly expressed transcripts are more likely to have expressed proteins. Approximately 1,000 proteins were detected in each run of the small sample proteomics. These proteins were mapped to gene symbols and compared with gene sets representing canonical pathways, more than 4,000 genes were extracted from the enriched gene sets. The identified canonical pathways were largely overlapping between individual runs. Of identified pathways 182 were shared between three individual small sample runs. Conclusions: Current technologies enable us to directly 10% of expressed proteomes from small sample comprising as few as 50 cells. We used knowledge-based approaches to elucidate the missing proteome that can be verified by targeted proteomics. This knowledge-based approach includes pathway analysis and combination of gene expression and protein expression data for target prioritization. Genes present in both the enriched gene sets (canonical pathways collection) and in small sample proteomics data correspond to approximately 50% of expressed proteomes in larger sample proteomics data. In addition, 90% of targets from canonical pathways were estimated to be expressed. The comparison of proteomics and transcriptomics data, suggests that highly expressed transcripts have high probability of protein expression. However, approximately 10% of expressed proteins could not be matched with the expressed transcripts.

UR - http://www.scopus.com/inward/record.url?scp=84939524933&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84939524933&partnerID=8YFLogxK

U2 - 10.1186/1471-2164-15-S9-S1

DO - 10.1186/1471-2164-15-S9-S1

M3 - Article

C2 - 25521637

AN - SCOPUS:84939524933

VL - 15

JO - BMC Genomics

JF - BMC Genomics

SN - 1471-2164

M1 - S1

ER -