Incremental maintenance of biological databases using association rule mining

Kai Tak Lam, Judice L Y Koh, Bharadwaj Veeravalli, Vladimir Brusic

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Biological research frequently requires specialist databases to support in-depth analysis about specific subjects. With the rapid growth of biological sequences in public domain data sources, it is difficult to keep these databases current with the sources. Simple queries formulated to retrieve relevant sequences typically return a large number of false matches and thus demanding manual filtration. In this paper, we propose a novel methodology that can support automatic incremental updating of specialist databases. Complex queries for incremental updating of relevant sequences are learned using Association Rule Mining (ARM), resulting in a significant reduction in false positive matches. This is the first time ARM is used in formulating descriptive queries for the purpose of incremental maintenance of specialised biological databases. We have implemented and tested our methodology on two real-world databases. Our experiments conclusively show that the methodology guarantees an F-score of up to 80% in detecting new sequences for these two databases.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
PublisherSpringer Verlag
Pages140-150
Number of pages11
Volume4146 LNBI
ISBN (Print)3540374469, 9783540374466
Publication statusPublished - 2006
Externally publishedYes
EventInternational Workshop on Pattern Recognition in Bioinformatics, PRIB 2006 - Hong Kong, China
Duration: Aug 20 2006Aug 20 2006

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4146 LNBI
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

OtherInternational Workshop on Pattern Recognition in Bioinformatics, PRIB 2006
CountryChina
CityHong Kong
Period8/20/068/20/06

Fingerprint

Association Rule Mining
Association rules
Maintenance
Query
Updating
Methodology
False Positive
Filtration
Experiment
Experiments

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Lam, K. T., Koh, J. L. Y., Veeravalli, B., & Brusic, V. (2006). Incremental maintenance of biological databases using association rule mining. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4146 LNBI, pp. 140-150). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4146 LNBI). Springer Verlag.

Incremental maintenance of biological databases using association rule mining. / Lam, Kai Tak; Koh, Judice L Y; Veeravalli, Bharadwaj; Brusic, Vladimir.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 4146 LNBI Springer Verlag, 2006. p. 140-150 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4146 LNBI).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Lam, KT, Koh, JLY, Veeravalli, B & Brusic, V 2006, Incremental maintenance of biological databases using association rule mining. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 4146 LNBI, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 4146 LNBI, Springer Verlag, pp. 140-150, International Workshop on Pattern Recognition in Bioinformatics, PRIB 2006, Hong Kong, China, 8/20/06.
Lam KT, Koh JLY, Veeravalli B, Brusic V. Incremental maintenance of biological databases using association rule mining. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 4146 LNBI. Springer Verlag. 2006. p. 140-150. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
Lam, Kai Tak ; Koh, Judice L Y ; Veeravalli, Bharadwaj ; Brusic, Vladimir. / Incremental maintenance of biological databases using association rule mining. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 4146 LNBI Springer Verlag, 2006. pp. 140-150 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{2fe55ac7ce104e2b9ee6d24f4415d0d6,
title = "Incremental maintenance of biological databases using association rule mining",
abstract = "Biological research frequently requires specialist databases to support in-depth analysis about specific subjects. With the rapid growth of biological sequences in public domain data sources, it is difficult to keep these databases current with the sources. Simple queries formulated to retrieve relevant sequences typically return a large number of false matches and thus demanding manual filtration. In this paper, we propose a novel methodology that can support automatic incremental updating of specialist databases. Complex queries for incremental updating of relevant sequences are learned using Association Rule Mining (ARM), resulting in a significant reduction in false positive matches. This is the first time ARM is used in formulating descriptive queries for the purpose of incremental maintenance of specialised biological databases. We have implemented and tested our methodology on two real-world databases. Our experiments conclusively show that the methodology guarantees an F-score of up to 80{\%} in detecting new sequences for these two databases.",
author = "Lam, {Kai Tak} and Koh, {Judice L Y} and Bharadwaj Veeravalli and Vladimir Brusic",
year = "2006",
language = "English",
isbn = "3540374469",
volume = "4146 LNBI",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "140--150",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
address = "Germany",

}

TY - GEN

T1 - Incremental maintenance of biological databases using association rule mining

AU - Lam, Kai Tak

AU - Koh, Judice L Y

AU - Veeravalli, Bharadwaj

AU - Brusic, Vladimir

PY - 2006

Y1 - 2006

N2 - Biological research frequently requires specialist databases to support in-depth analysis about specific subjects. With the rapid growth of biological sequences in public domain data sources, it is difficult to keep these databases current with the sources. Simple queries formulated to retrieve relevant sequences typically return a large number of false matches and thus demanding manual filtration. In this paper, we propose a novel methodology that can support automatic incremental updating of specialist databases. Complex queries for incremental updating of relevant sequences are learned using Association Rule Mining (ARM), resulting in a significant reduction in false positive matches. This is the first time ARM is used in formulating descriptive queries for the purpose of incremental maintenance of specialised biological databases. We have implemented and tested our methodology on two real-world databases. Our experiments conclusively show that the methodology guarantees an F-score of up to 80% in detecting new sequences for these two databases.

AB - Biological research frequently requires specialist databases to support in-depth analysis about specific subjects. With the rapid growth of biological sequences in public domain data sources, it is difficult to keep these databases current with the sources. Simple queries formulated to retrieve relevant sequences typically return a large number of false matches and thus demanding manual filtration. In this paper, we propose a novel methodology that can support automatic incremental updating of specialist databases. Complex queries for incremental updating of relevant sequences are learned using Association Rule Mining (ARM), resulting in a significant reduction in false positive matches. This is the first time ARM is used in formulating descriptive queries for the purpose of incremental maintenance of specialised biological databases. We have implemented and tested our methodology on two real-world databases. Our experiments conclusively show that the methodology guarantees an F-score of up to 80% in detecting new sequences for these two databases.

UR - http://www.scopus.com/inward/record.url?scp=33750038301&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33750038301&partnerID=8YFLogxK

M3 - Conference contribution

SN - 3540374469

SN - 9783540374466

VL - 4146 LNBI

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 140

EP - 150

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

PB - Springer Verlag

ER -