TY - JOUR
T1 - Artificial intelligence based personalized predictive survival among colorectal cancer patients
AU - Susič, David
AU - Syed-Abdul, Shabbir
AU - Dovgan, Erik
AU - Jonnagaddala, Jitendra
AU - Gradišek, Anton
N1 - Funding Information:
The authors acknowledge the financial support from the Slovenian Research Agency, research core funding No. P2-0209 and PR-10495 . SS-A was funded by the Ministry of Science and Technology, Taiwan ( 110-2923-E-038-001-MY3 ); Taipei Medical University, Taiwan ( 109–3800–020–400 ). Jitendra Jonnagaddala is funded by the Australian National Health and Medical Research Council (No. GNT1192469 ). Jitendra Jonnagaddala also acknowledges the funding support received through the Research Technology Services at UNSW Sydney, Google Cloud Research (award# GCP19980904 ) and NVIDIA Academic Hardware grant programs . We also would like to thank the SREDH Consortium's ( www.sredhconsortium.org , accessed on 15 November 2022) Translational Cancer Bioinformatics working group for the access to the MCO CRC dataset.
Publisher Copyright:
© 2023 The Author(s)
PY - 2023/4
Y1 - 2023/4
N2 - Background and Objective: Colorectal cancer is a major health concern. It is now the third most common cancer and the fourth leading cause of cancer mortality worldwide. The aim of this study was to evaluate the performance of machine learning algorithms for predicting survival of colorectal cancer patients 1 to 5 years after diagnosis, and identify the most important variables. Methods: A sample of 1236 patients diagnosed with colorectal cancer and 118 predictor variables has been used. The outcome of interest was a binary variable indicating whether the patient survived the number of years in question or not. 20 predictor variables were selected using mutual information score with the outcome. We implemented 11 machine learning algorithms and evaluated their performance with a 5 by 2-fold cross-validation with stratified folds and with paired Student's t-tests. We compared the results with the Kaplan-Meier estimator and Cox's proportional hazard regression. Results: Using the 20 most important predictor variables for each of the survival years, the logistic regression algorithm achieved an area under the receiver operating characteristic curve of 0.850 (0.014 SD, 0.840-0.860 95 % CI) for the 1-year, and 0.872 (0.014 SD, 0.861-0.882 95% CI) for the 5-year survival prediction. Using only the 5 most important predictor variables, the corresponding values are 0.793 (0.020 SD, 0.778-0.807 95% CI) and 0.794 (0.011 SD, 0.785-0.802 95% CI). The most important variables for 1-year prediction were number of R residual, M distant metastasis, overall stage, probable recurrence within 5 years, and tumour length, whereas for 5-year prediction the most important were probable recurrence within 5 years, R residual, M distant metastasis, number of positive lymph nodes, and palliative chemotherapy. Biomarkers do not appear among the top 20 most important ones. For all survival intervals, the probability of the top model agrees with the Kaplan-Meier estimate, both in the interval of one standard deviation and in the 95% confidence interval. Conclusions: The findings suggest that machine learning algorithms can predict the survival probability of colorectal cancer patients and can be used to inform the patients and assist decision-making in clinical care management. In addition, this study unveils the most essential variables for estimating survival short- and long-term among patients with Colorectal cancer.
AB - Background and Objective: Colorectal cancer is a major health concern. It is now the third most common cancer and the fourth leading cause of cancer mortality worldwide. The aim of this study was to evaluate the performance of machine learning algorithms for predicting survival of colorectal cancer patients 1 to 5 years after diagnosis, and identify the most important variables. Methods: A sample of 1236 patients diagnosed with colorectal cancer and 118 predictor variables has been used. The outcome of interest was a binary variable indicating whether the patient survived the number of years in question or not. 20 predictor variables were selected using mutual information score with the outcome. We implemented 11 machine learning algorithms and evaluated their performance with a 5 by 2-fold cross-validation with stratified folds and with paired Student's t-tests. We compared the results with the Kaplan-Meier estimator and Cox's proportional hazard regression. Results: Using the 20 most important predictor variables for each of the survival years, the logistic regression algorithm achieved an area under the receiver operating characteristic curve of 0.850 (0.014 SD, 0.840-0.860 95 % CI) for the 1-year, and 0.872 (0.014 SD, 0.861-0.882 95% CI) for the 5-year survival prediction. Using only the 5 most important predictor variables, the corresponding values are 0.793 (0.020 SD, 0.778-0.807 95% CI) and 0.794 (0.011 SD, 0.785-0.802 95% CI). The most important variables for 1-year prediction were number of R residual, M distant metastasis, overall stage, probable recurrence within 5 years, and tumour length, whereas for 5-year prediction the most important were probable recurrence within 5 years, R residual, M distant metastasis, number of positive lymph nodes, and palliative chemotherapy. Biomarkers do not appear among the top 20 most important ones. For all survival intervals, the probability of the top model agrees with the Kaplan-Meier estimate, both in the interval of one standard deviation and in the 95% confidence interval. Conclusions: The findings suggest that machine learning algorithms can predict the survival probability of colorectal cancer patients and can be used to inform the patients and assist decision-making in clinical care management. In addition, this study unveils the most essential variables for estimating survival short- and long-term among patients with Colorectal cancer.
KW - Cancer Survival
KW - Colorectal Cancer
KW - Machine Learning
KW - Survival Prediction
UR - http://www.scopus.com/inward/record.url?scp=85149057885&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85149057885&partnerID=8YFLogxK
U2 - 10.1016/j.cmpb.2023.107435
DO - 10.1016/j.cmpb.2023.107435
M3 - Article
C2 - 36842345
AN - SCOPUS:85149057885
SN - 0169-2607
VL - 231
JO - Computer Methods and Programs in Biomedicine
JF - Computer Methods and Programs in Biomedicine
M1 - 107435
ER -