Extraction by example

Induction of structural rules for the analysis of molecular sequence data from heterogeneous sources

Olivo Miotto, Tin Wee Tan, Vladimir Brusic

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

Biological research requires information from multiple data sources that use a variety of database-specific formats. Manual gathering of information is time consuming and error-prone, making automated data aggregation a compelling option for large studies. We describe a method for extracting information from diverse sources that involves structural rules specified by example. We developed a system for aggregation of biological knowledge (ABK) and used it to conduct an epidemiological study of dengue virus (DENV) sequences. Additional information on geographical origin and isolation date is critical for understanding evolutionary relationships, but this data is inconsistently structured in database entries. Using three public databases, we found that structural rules can be used successfully even when applied on inconsistently structured data that is distributed across multiple fields. High reusability, combined with the ability to integrate analysis tools, make this method suitable for a wide variety of large-scale studies involving viral sequences.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science
EditorsM. Gallagher, J. Hogan, F. Maire
Pages398-405
Number of pages8
Volume3578
Publication statusPublished - 2005
Externally publishedYes
Event6th International Conference on Intelligent Data Engineering and Automated Learning - IDEAL 2005 - Brisbane, Australia
Duration: Jul 6 2005Jul 8 2005

Other

Other6th International Conference on Intelligent Data Engineering and Automated Learning - IDEAL 2005
CountryAustralia
CityBrisbane
Period7/6/057/8/05

Fingerprint

Agglomeration
Reusability
Viruses

ASJC Scopus subject areas

  • Computer Science (miscellaneous)

Cite this

Miotto, O., Tan, T. W., & Brusic, V. (2005). Extraction by example: Induction of structural rules for the analysis of molecular sequence data from heterogeneous sources. In M. Gallagher, J. Hogan, & F. Maire (Eds.), Lecture Notes in Computer Science (Vol. 3578, pp. 398-405)

Extraction by example : Induction of structural rules for the analysis of molecular sequence data from heterogeneous sources. / Miotto, Olivo; Tan, Tin Wee; Brusic, Vladimir.

Lecture Notes in Computer Science. ed. / M. Gallagher; J. Hogan; F. Maire. Vol. 3578 2005. p. 398-405.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Miotto, O, Tan, TW & Brusic, V 2005, Extraction by example: Induction of structural rules for the analysis of molecular sequence data from heterogeneous sources. in M Gallagher, J Hogan & F Maire (eds), Lecture Notes in Computer Science. vol. 3578, pp. 398-405, 6th International Conference on Intelligent Data Engineering and Automated Learning - IDEAL 2005, Brisbane, Australia, 7/6/05.
Miotto O, Tan TW, Brusic V. Extraction by example: Induction of structural rules for the analysis of molecular sequence data from heterogeneous sources. In Gallagher M, Hogan J, Maire F, editors, Lecture Notes in Computer Science. Vol. 3578. 2005. p. 398-405
Miotto, Olivo ; Tan, Tin Wee ; Brusic, Vladimir. / Extraction by example : Induction of structural rules for the analysis of molecular sequence data from heterogeneous sources. Lecture Notes in Computer Science. editor / M. Gallagher ; J. Hogan ; F. Maire. Vol. 3578 2005. pp. 398-405
@inproceedings{88d6ad5884f640b28a020ba3f7f70c2f,
title = "Extraction by example: Induction of structural rules for the analysis of molecular sequence data from heterogeneous sources",
abstract = "Biological research requires information from multiple data sources that use a variety of database-specific formats. Manual gathering of information is time consuming and error-prone, making automated data aggregation a compelling option for large studies. We describe a method for extracting information from diverse sources that involves structural rules specified by example. We developed a system for aggregation of biological knowledge (ABK) and used it to conduct an epidemiological study of dengue virus (DENV) sequences. Additional information on geographical origin and isolation date is critical for understanding evolutionary relationships, but this data is inconsistently structured in database entries. Using three public databases, we found that structural rules can be used successfully even when applied on inconsistently structured data that is distributed across multiple fields. High reusability, combined with the ability to integrate analysis tools, make this method suitable for a wide variety of large-scale studies involving viral sequences.",
author = "Olivo Miotto and Tan, {Tin Wee} and Vladimir Brusic",
year = "2005",
language = "English",
volume = "3578",
pages = "398--405",
editor = "M. Gallagher and J. Hogan and F. Maire",
booktitle = "Lecture Notes in Computer Science",

}

TY - GEN

T1 - Extraction by example

T2 - Induction of structural rules for the analysis of molecular sequence data from heterogeneous sources

AU - Miotto, Olivo

AU - Tan, Tin Wee

AU - Brusic, Vladimir

PY - 2005

Y1 - 2005

N2 - Biological research requires information from multiple data sources that use a variety of database-specific formats. Manual gathering of information is time consuming and error-prone, making automated data aggregation a compelling option for large studies. We describe a method for extracting information from diverse sources that involves structural rules specified by example. We developed a system for aggregation of biological knowledge (ABK) and used it to conduct an epidemiological study of dengue virus (DENV) sequences. Additional information on geographical origin and isolation date is critical for understanding evolutionary relationships, but this data is inconsistently structured in database entries. Using three public databases, we found that structural rules can be used successfully even when applied on inconsistently structured data that is distributed across multiple fields. High reusability, combined with the ability to integrate analysis tools, make this method suitable for a wide variety of large-scale studies involving viral sequences.

AB - Biological research requires information from multiple data sources that use a variety of database-specific formats. Manual gathering of information is time consuming and error-prone, making automated data aggregation a compelling option for large studies. We describe a method for extracting information from diverse sources that involves structural rules specified by example. We developed a system for aggregation of biological knowledge (ABK) and used it to conduct an epidemiological study of dengue virus (DENV) sequences. Additional information on geographical origin and isolation date is critical for understanding evolutionary relationships, but this data is inconsistently structured in database entries. Using three public databases, we found that structural rules can be used successfully even when applied on inconsistently structured data that is distributed across multiple fields. High reusability, combined with the ability to integrate analysis tools, make this method suitable for a wide variety of large-scale studies involving viral sequences.

UR - http://www.scopus.com/inward/record.url?scp=26444603744&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=26444603744&partnerID=8YFLogxK

M3 - Conference contribution

VL - 3578

SP - 398

EP - 405

BT - Lecture Notes in Computer Science

A2 - Gallagher, M.

A2 - Hogan, J.

A2 - Maire, F.

ER -