TY - JOUR
T1 - Specific patterns and potential risk factors to predict 3-year risk of death among non-cancer patients with advanced chronic kidney disease by machine learning
AU - Chang, Tzu Hao
AU - Chen, Yu Da
AU - Lu, Henry Horng Shing
AU - Wu, Jenny L.
AU - Mak, Katelyn
AU - Yu, Cheng Sheng
N1 - Publisher Copyright:
© 2024 Lippincott Williams and Wilkins. All rights reserved.
PY - 2024/2/16
Y1 - 2024/2/16
N2 - Chronic kidney disease (CKD) is a major public health concern. But there are limited machine learning studies on non-cancer patients with advanced CKD, and the results of machine learning studies on cancer patients with CKD may not apply directly on non-cancer patients. We aimed to conduct a comprehensive investigation of risk factors for a 3-year risk of death among non-cancer advanced CKD patients with an estimated glomerular filtration rate < 60.0 mL/min/1.73m2by several machine learning algorithms. In this retrospective cohort study, we collected data from in-hospital and emergency care patients from 2 hospitals in Taiwan from 2009 to 2019, including their international classification of disease at admission and laboratory data from the hospital's electronic medical records (EMRs). Several machine learning algorithms were used to analyze the potential impact and degree of influence of each factor on mortality and survival. Data from 2 hospitals in northern Taiwan were collected with 6565 enrolled patients. After data cleaning, 26 risk factors and approximately 3887 advanced CKD patients from Shuang Ho Hospital were used as the training set. The validation set contained 2299 patients from Taipei Medical University Hospital. Predictive variables, such as albumin, PT-INR, and age, were the top 3 significant risk factors with paramount influence on mortality prediction. In the receiver operating characteristic curve, the random forest had the highest values for accuracy above 0.80. MLP, and Adaboost had better performance on sensitivity and F1-score compared to other methods. Additionally, SVM with linear kernel function had the highest specificity of 0.9983, while its sensitivity and F1-score were poor. Logistic regression had the best performance, with an area under the curve of 0.8527. Evaluating Taiwanese advanced CKD patients' EMRs could provide physicians with a good approximation of the patients' 3-year risk of death by machine learning algorithms.
AB - Chronic kidney disease (CKD) is a major public health concern. But there are limited machine learning studies on non-cancer patients with advanced CKD, and the results of machine learning studies on cancer patients with CKD may not apply directly on non-cancer patients. We aimed to conduct a comprehensive investigation of risk factors for a 3-year risk of death among non-cancer advanced CKD patients with an estimated glomerular filtration rate < 60.0 mL/min/1.73m2by several machine learning algorithms. In this retrospective cohort study, we collected data from in-hospital and emergency care patients from 2 hospitals in Taiwan from 2009 to 2019, including their international classification of disease at admission and laboratory data from the hospital's electronic medical records (EMRs). Several machine learning algorithms were used to analyze the potential impact and degree of influence of each factor on mortality and survival. Data from 2 hospitals in northern Taiwan were collected with 6565 enrolled patients. After data cleaning, 26 risk factors and approximately 3887 advanced CKD patients from Shuang Ho Hospital were used as the training set. The validation set contained 2299 patients from Taipei Medical University Hospital. Predictive variables, such as albumin, PT-INR, and age, were the top 3 significant risk factors with paramount influence on mortality prediction. In the receiver operating characteristic curve, the random forest had the highest values for accuracy above 0.80. MLP, and Adaboost had better performance on sensitivity and F1-score compared to other methods. Additionally, SVM with linear kernel function had the highest specificity of 0.9983, while its sensitivity and F1-score were poor. Logistic regression had the best performance, with an area under the curve of 0.8527. Evaluating Taiwanese advanced CKD patients' EMRs could provide physicians with a good approximation of the patients' 3-year risk of death by machine learning algorithms.
KW - data analysis
KW - ensemble learning
KW - healthcare
KW - machine learning
KW - medical informatics
KW - multi-center data
KW - non-cancer-related chronic kidney disease
UR - http://www.scopus.com/inward/record.url?scp=85185315968&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85185315968&partnerID=8YFLogxK
U2 - 10.1097/MD.0000000000037112
DO - 10.1097/MD.0000000000037112
M3 - Article
C2 - 38363886
AN - SCOPUS:85185315968
SN - 0025-7974
VL - 103
SP - E37112
JO - Medicine (United States)
JF - Medicine (United States)
IS - 7
ER -