Effective diagnosis of heart disease imposed by incomplete data based on fuzzy random forest

Elzhan Zeinulla, Karina Bekbayeva, Adnan Yazici

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

This study presents data preprocessing and imputation techniques for creating a model from medical sensor data. We aim to solve the problem of creating a framework to diagnose heart diseases with an incomplete and dirty data, which is common with medical data. The medical dataset is often incomplete and dirty due to its small size, imbalance and many missing, false, inaccurate data. In this study, we utilize the synthetic minority oversampling technique with the combination of Tomek links to increase the size and eliminate the imbalance of the dataset. We performed a number of experiments and measurements on the Cleveland dataset and conducted a comparative study of various prediction models with recent algorithms in the literature. In order to process additional data from Budapest, Zurich and Basel, we apply the technique of semi-supervised pseudo-labelling, which means that the model has been trained on unlabeled data and combined with labelled data by predicting unlabeled values and making them pseudo-labelled. Then, the same algorithm that we used for Cleveland dataset was applied for the entire dataset. As the main classifier, Fuzzy Random Forest technique was implemented. The final accuracy of the approach proposed in this study is 93.4%, with the specificity and sensitivity values of 96.92% and 89.99%, respectively, which is superior to previous models included in the literature.

Original languageEnglish
Title of host publication2020 IEEE International Conference on Fuzzy Systems, FUZZ 2020 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781728169323
DOIs
Publication statusPublished - Jul 2020
Event2020 IEEE International Conference on Fuzzy Systems, FUZZ 2020 - Glasgow, United Kingdom
Duration: Jul 19 2020Jul 24 2020

Publication series

NameIEEE International Conference on Fuzzy Systems
Volume2020-July
ISSN (Print)1098-7584

Conference

Conference2020 IEEE International Conference on Fuzzy Systems, FUZZ 2020
Country/TerritoryUnited Kingdom
CityGlasgow
Period7/19/207/24/20

Keywords

  • Data Preparation
  • Fuzzy Random Forest
  • Heart Disease
  • Multiple Imputation by Chained Equations (MICE)
  • Pseudo-labelling
  • Semi-Supervised Learning
  • SMOTE
  • Tomek

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Artificial Intelligence
  • Applied Mathematics

Fingerprint

Dive into the research topics of 'Effective diagnosis of heart disease imposed by incomplete data based on fuzzy random forest'. Together they form a unique fingerprint.

Cite this