A Comparison of Oversampling Methods for Constructing a Prognostic Model in the Patient with Heart Failure

Young Tak Kim, Dong Kyu Kim, Hakseung Kim, Dong Joo Kim

Research output: Chapter in Book/Report/Conference proceedingConference contribution

18 Citations (Scopus)

Abstract

Heart failure (HF) is the terminal stage of all heart disease and the leading cause of mortality. A reliable prognostic model for predicting mortality in patients with HF can help to support better decisions in clinical practice. Many attempts have been made to increase the reliability of the prognostic model using electronic health record (EHR), but it is still not known which oversampling method is efficient in imbalanced and insufficient EHR dataset. This study performed a comparative analysis of renowned oversampling methods (i.e., synthetic minority oversampling technique (SMOTE), borderline-SMOTE, and adaptive synthetic (ADASYN) sampling techniques) in constructing prognostic models for HF patients. All 299 patients had left ventricular systolic dysfunction, belonging to New York Heart Association class III and IV (Survival = 203, Deceased = 96). Follow up time was 4-285 days with an average of 130 days. The above three oversampling methods were compared in the case where the prognostic models were constructed by the random forest to predict mortality of patients with HF. The baseline model without oversampling method showed an F-score of 0.55. The oversampling method improved the F-score by 0.05 or more compared to the baseline model. SMOTE showed the highest prognostic capacity (F-score = 0.63) among the oversampling methods (F-score of borderline SMOTE = 0.60, ADASYN = 0.62). In all three oversampling methods, ejection fraction, serum creatinine, and age were consistently observed with high importance. Consequently, SMOTE is the most adequate algorithm for oversampling EHR data to predict mortality in HF patients.

Original languageEnglish
Title of host publicationICTC 2020 - 11th International Conference on ICT Convergence
Subtitle of host publicationData, Network, and AI in the Age of Untact
PublisherIEEE Computer Society
Pages379-383
Number of pages5
ISBN (Electronic)9781728167589
DOIs
Publication statusPublished - 2020 Oct 21
Event11th International Conference on Information and Communication Technology Convergence, ICTC 2020 - Jeju Island, Korea, Republic of
Duration: 2020 Oct 212020 Oct 23

Publication series

NameInternational Conference on ICT Convergence
Volume2020-October
ISSN (Print)2162-1233
ISSN (Electronic)2162-1241

Conference

Conference11th International Conference on Information and Communication Technology Convergence, ICTC 2020
Country/TerritoryKorea, Republic of
CityJeju Island
Period20/10/2120/10/23

Bibliographical note

Publisher Copyright:
© 2020 IEEE.

Keywords

  • Electronic Health Record
  • Heart Failure
  • Oversampling
  • Prognostic Model

ASJC Scopus subject areas

  • Information Systems
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'A Comparison of Oversampling Methods for Constructing a Prognostic Model in the Patient with Heart Failure'. Together they form a unique fingerprint.

Cite this