An AutoML Approach for Predicting Risk of Progression to Active Tuberculosis based on Its Association with Host Genetic Variations

Wanying Dou, Yihang Liu, Zehai Liu, Dauren Yerezhepov, Ulan Kozhamkulov, Ainur Akilzhanova, Omar Dib, Chee Kai Chan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Tuberculosis (TB) is a worldwide health challenge. Mycobacterium tuberculosis(M.tb) is capable of evading the host immune system which can lead to tuberculosis infection. Household contacts (HHCs) of TB cases have a higher risk of infection. Novel predictive techniques to identify high-risk TB susceptible groups are needed. Susceptibility to Tuberculosis is associated with host genetic variations. This research work uses the TPOT autoML tool to map genetic variations and TB infection status mathematically. Machine learning was employed to predict the risk of progression to active tuberculosis based on associated host genetic variation. Among the three adopted configurations, "TPOT Default", "TPOT spars", "TPOT N that were used,""TPOT Default,"and "TPOT sparse"produced the same best performance both reaching 0.816 Training CV score and 0.625 Testing Accuracy. Different genes variants identified using this approach were found to have distinctive contributions for TB infection, which represent the feature importance of the classifier. The feature importance of the random forest classifier pipeline in "TPOT sparse"was adopted. The top ten contributing genes were also submitted to Enrichr for gene pathway enrichment analysis. The identified enriched pathways have been shown to be key to TB infection.

Original languageEnglish
Title of host publicationICBBS 2021 - Proceedings of 2021 10th International Conference on Bioinformatics and Biomedical Science
PublisherAssociation for Computing Machinery
Pages82-88
Number of pages7
ISBN (Electronic)9781450384308
DOIs
Publication statusPublished - Oct 29 2021
Event10th International Conference on Bioinformatics and Biomedical Science, ICBBS 2021 - Virtual, Online, China
Duration: Oct 29 2021Oct 31 2021

Publication series

NameACM International Conference Proceeding Series

Conference

Conference10th International Conference on Bioinformatics and Biomedical Science, ICBBS 2021
Country/TerritoryChina
CityVirtual, Online
Period10/29/2110/31/21

Keywords

  • Genetic Variation
  • Machine Learning
  • Tuberculosis

ASJC Scopus subject areas

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'An AutoML Approach for Predicting Risk of Progression to Active Tuberculosis based on Its Association with Host Genetic Variations'. Together they form a unique fingerprint.

Cite this