TY - JOUR
T1 - Uncertainty-Gated Stochastic Sequential Model for EHR Mortality Prediction
AU - Jun, Eunji
AU - Mulyadi, Ahmad Wisnu
AU - Choi, Jaehun
AU - Suk, Heung Il
N1 - Funding Information:
Manuscript received February 18, 2020; revised May 20, 2020; accepted August 6, 2020. Date of publication August 25, 2020; date of current version September 1, 2021. This work was supported in part by the Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea Government (MSIT) under Grant 2017-0-00053 (A Technology Development of Artificial Intelligence Doctors for Cardiovascular Disease) and in part by the Department of Artificial Intelligence, Korea University, under Grant 2019-0-00079. (Corresponding author: Heung-Il Suk.) Eunji Jun and Ahmad Wisnu Mulyadi are with the Department of Brain and Cognitive Engineering, Korea University, Seoul 02841, South Korea (e-mail: ejjun92@korea.ac.kr; wisnumulyadi@korea.ac.kr).
Publisher Copyright:
© 2012 IEEE.
PY - 2021/9
Y1 - 2021/9
N2 - Electronic health records (EHRs) are characterized as nonstationary, heterogeneous, noisy, and sparse data; therefore, it is challenging to learn the regularities or patterns inherent within them. In particular, sparseness caused mostly by many missing values has attracted the attention of researchers who have attempted to find a better use of all available samples for determining the solution of a primary target task through defining a secondary imputation problem. Methodologically, existing methods, either deterministic or stochastic, have applied different assumptions to impute missing values. However, once the missing values are imputed, most existing methods do not consider the fidelity or confidence of the imputed values in the modeling of downstream tasks. Undoubtedly, an erroneous or improper imputation of missing variables can cause difficulties in the modeling as well as a degraded performance. In this study, we present a novel variational recurrent network that: 1) estimates the distribution of missing variables (e.g., the mean and variance) allowing to represent uncertainty in the imputed values; 2) updates hidden states by explicitly applying fidelity based on a variance of the imputed values during a recurrence (i.e., uncertainty propagation over time); and 3) predicts the possibility of in-hospital mortality. It is noteworthy that our model can conduct these procedures in a single stream and learn all network parameters jointly in an end-to-end manner. We validated the effectiveness of our method using the public data sets of MIMIC-III and PhysioNet challenge 2012 by comparing with and outperforming other state-of-the-art methods for mortality prediction considered in our experiments. In addition, we identified the behavior of the model that well represented the uncertainties for the imputed estimates, which showed a high correlation between the uncertainties and mean absolute error (MAE) scores for imputation.
AB - Electronic health records (EHRs) are characterized as nonstationary, heterogeneous, noisy, and sparse data; therefore, it is challenging to learn the regularities or patterns inherent within them. In particular, sparseness caused mostly by many missing values has attracted the attention of researchers who have attempted to find a better use of all available samples for determining the solution of a primary target task through defining a secondary imputation problem. Methodologically, existing methods, either deterministic or stochastic, have applied different assumptions to impute missing values. However, once the missing values are imputed, most existing methods do not consider the fidelity or confidence of the imputed values in the modeling of downstream tasks. Undoubtedly, an erroneous or improper imputation of missing variables can cause difficulties in the modeling as well as a degraded performance. In this study, we present a novel variational recurrent network that: 1) estimates the distribution of missing variables (e.g., the mean and variance) allowing to represent uncertainty in the imputed values; 2) updates hidden states by explicitly applying fidelity based on a variance of the imputed values during a recurrence (i.e., uncertainty propagation over time); and 3) predicts the possibility of in-hospital mortality. It is noteworthy that our model can conduct these procedures in a single stream and learn all network parameters jointly in an end-to-end manner. We validated the effectiveness of our method using the public data sets of MIMIC-III and PhysioNet challenge 2012 by comparing with and outperforming other state-of-the-art methods for mortality prediction considered in our experiments. In addition, we identified the behavior of the model that well represented the uncertainties for the imputed estimates, which showed a high correlation between the uncertainties and mean absolute error (MAE) scores for imputation.
KW - Bioinformatics
KW - deep generative model
KW - deep learning (DL)
KW - electronic health records (EHRs)
KW - missing value imputation
KW - mortality prediction
KW - time series modeling
KW - uncertainty
UR - http://www.scopus.com/inward/record.url?scp=85091316861&partnerID=8YFLogxK
U2 - 10.1109/TNNLS.2020.3016670
DO - 10.1109/TNNLS.2020.3016670
M3 - Article
C2 - 32841128
AN - SCOPUS:85091316861
SN - 2162-237X
VL - 32
SP - 4052
EP - 4062
JO - IEEE Transactions on Neural Networks and Learning Systems
JF - IEEE Transactions on Neural Networks and Learning Systems
IS - 9
M1 - 9177349
ER -