Textmining in support of knowledge discovery for vaccine development

Christian Schönbach, Takeshi Nagashima, Akihiko Konagaya

Research output: Contribution to journalArticle

7 Citations (Scopus)

Abstract

Complete genome data of infectious microorganisms permit systematic computational sequence-based predictions and experimental testing of candidate vaccine epitopes. Both, predictions and the interpretation of experiments rely on existing information in the literature which is mostly manually extracted and curated. The growing amount of data and literature information has created a major bottleneck for the interpretation of results and maintenance of curated databases. The lack of suitable free-text information extraction, processing, and reporting tools prompted us to develop a knowledge discovery support system that enhances the understanding of immune response and vaccine development. The current prototype system, Gene expression/epitpopes/protein interaction (GEpi), focusses on molecular functions of HIV-infected T-cells and HIV epitope information, using textmining, and interrelation of biomolecular data from domain-specific databases with MEDLINE abstract-inferred information. Results showed that extraction and processing of molecular interaction, disease associations, and gene ontology-derived functional information generate intuitive knowledge reports that aid the interpretation of host-pathogen interaction. In contrast, epitope (word and sequence) information in MEDLINE abstracts is surprisingly sparse and often lacks necessary context information, such as HLA-restriction. Since the majority of epitope information is found in tables, figures, and legends of full-text articles, its extraction may not require sophisticated natural language processing techniques. Support of vaccine development through textmining requires therefore the timely development of domain-specific extraction rules for full-text articles, and a knowledge model for epitope-related information.

Original languageEnglish
Pages (from-to)488-495
Number of pages8
JournalMethods
Volume34
Issue number4
DOIs
Publication statusPublished - Dec 2004
Externally publishedYes

Fingerprint

Data mining
Epitopes
Vaccines
MEDLINE
Natural Language Processing
HIV
Databases
Host-Pathogen Interactions
Processing
Genes
Gene Ontology
T-Lymphocyte Epitopes
Information Storage and Retrieval
Automatic Data Processing
T-cells
Molecular interactions
Pathogens
Gene expression
Microorganisms
Maintenance

Keywords

  • Disease association
  • Epitope
  • Gene ontology
  • HIV infection
  • MeSH
  • Molecular interaction
  • T-cell
  • Text information retrieval
  • Textmining
  • Vaccine development

ASJC Scopus subject areas

  • Molecular Biology

Cite this

Textmining in support of knowledge discovery for vaccine development. / Schönbach, Christian; Nagashima, Takeshi; Konagaya, Akihiko.

In: Methods, Vol. 34, No. 4, 12.2004, p. 488-495.

Research output: Contribution to journalArticle

Schönbach, Christian ; Nagashima, Takeshi ; Konagaya, Akihiko. / Textmining in support of knowledge discovery for vaccine development. In: Methods. 2004 ; Vol. 34, No. 4. pp. 488-495.
@article{6ec04b8cf09d4136a426d3b789aa0812,
title = "Textmining in support of knowledge discovery for vaccine development",
abstract = "Complete genome data of infectious microorganisms permit systematic computational sequence-based predictions and experimental testing of candidate vaccine epitopes. Both, predictions and the interpretation of experiments rely on existing information in the literature which is mostly manually extracted and curated. The growing amount of data and literature information has created a major bottleneck for the interpretation of results and maintenance of curated databases. The lack of suitable free-text information extraction, processing, and reporting tools prompted us to develop a knowledge discovery support system that enhances the understanding of immune response and vaccine development. The current prototype system, Gene expression/epitpopes/protein interaction (GEpi), focusses on molecular functions of HIV-infected T-cells and HIV epitope information, using textmining, and interrelation of biomolecular data from domain-specific databases with MEDLINE abstract-inferred information. Results showed that extraction and processing of molecular interaction, disease associations, and gene ontology-derived functional information generate intuitive knowledge reports that aid the interpretation of host-pathogen interaction. In contrast, epitope (word and sequence) information in MEDLINE abstracts is surprisingly sparse and often lacks necessary context information, such as HLA-restriction. Since the majority of epitope information is found in tables, figures, and legends of full-text articles, its extraction may not require sophisticated natural language processing techniques. Support of vaccine development through textmining requires therefore the timely development of domain-specific extraction rules for full-text articles, and a knowledge model for epitope-related information.",
keywords = "Disease association, Epitope, Gene ontology, HIV infection, MeSH, Molecular interaction, T-cell, Text information retrieval, Textmining, Vaccine development",
author = "Christian Sch{\"o}nbach and Takeshi Nagashima and Akihiko Konagaya",
year = "2004",
month = "12",
doi = "10.1016/j.ymeth.2004.06.009",
language = "English",
volume = "34",
pages = "488--495",
journal = "Methods",
issn = "1046-2023",
publisher = "Academic Press Inc.",
number = "4",

}

TY - JOUR

T1 - Textmining in support of knowledge discovery for vaccine development

AU - Schönbach, Christian

AU - Nagashima, Takeshi

AU - Konagaya, Akihiko

PY - 2004/12

Y1 - 2004/12

N2 - Complete genome data of infectious microorganisms permit systematic computational sequence-based predictions and experimental testing of candidate vaccine epitopes. Both, predictions and the interpretation of experiments rely on existing information in the literature which is mostly manually extracted and curated. The growing amount of data and literature information has created a major bottleneck for the interpretation of results and maintenance of curated databases. The lack of suitable free-text information extraction, processing, and reporting tools prompted us to develop a knowledge discovery support system that enhances the understanding of immune response and vaccine development. The current prototype system, Gene expression/epitpopes/protein interaction (GEpi), focusses on molecular functions of HIV-infected T-cells and HIV epitope information, using textmining, and interrelation of biomolecular data from domain-specific databases with MEDLINE abstract-inferred information. Results showed that extraction and processing of molecular interaction, disease associations, and gene ontology-derived functional information generate intuitive knowledge reports that aid the interpretation of host-pathogen interaction. In contrast, epitope (word and sequence) information in MEDLINE abstracts is surprisingly sparse and often lacks necessary context information, such as HLA-restriction. Since the majority of epitope information is found in tables, figures, and legends of full-text articles, its extraction may not require sophisticated natural language processing techniques. Support of vaccine development through textmining requires therefore the timely development of domain-specific extraction rules for full-text articles, and a knowledge model for epitope-related information.

AB - Complete genome data of infectious microorganisms permit systematic computational sequence-based predictions and experimental testing of candidate vaccine epitopes. Both, predictions and the interpretation of experiments rely on existing information in the literature which is mostly manually extracted and curated. The growing amount of data and literature information has created a major bottleneck for the interpretation of results and maintenance of curated databases. The lack of suitable free-text information extraction, processing, and reporting tools prompted us to develop a knowledge discovery support system that enhances the understanding of immune response and vaccine development. The current prototype system, Gene expression/epitpopes/protein interaction (GEpi), focusses on molecular functions of HIV-infected T-cells and HIV epitope information, using textmining, and interrelation of biomolecular data from domain-specific databases with MEDLINE abstract-inferred information. Results showed that extraction and processing of molecular interaction, disease associations, and gene ontology-derived functional information generate intuitive knowledge reports that aid the interpretation of host-pathogen interaction. In contrast, epitope (word and sequence) information in MEDLINE abstracts is surprisingly sparse and often lacks necessary context information, such as HLA-restriction. Since the majority of epitope information is found in tables, figures, and legends of full-text articles, its extraction may not require sophisticated natural language processing techniques. Support of vaccine development through textmining requires therefore the timely development of domain-specific extraction rules for full-text articles, and a knowledge model for epitope-related information.

KW - Disease association

KW - Epitope

KW - Gene ontology

KW - HIV infection

KW - MeSH

KW - Molecular interaction

KW - T-cell

KW - Text information retrieval

KW - Textmining

KW - Vaccine development

UR - http://www.scopus.com/inward/record.url?scp=7944220370&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=7944220370&partnerID=8YFLogxK

U2 - 10.1016/j.ymeth.2004.06.009

DO - 10.1016/j.ymeth.2004.06.009

M3 - Article

VL - 34

SP - 488

EP - 495

JO - Methods

JF - Methods

SN - 1046-2023

IS - 4

ER -