TY - JOUR
T1 - Development and Validation of Novel Deep-Learning Models Using Multiple Data Types for Lung Cancer Survival
AU - Hsu, Jason C.
AU - Nguyen, Phung Anh
AU - Phuc, Phan Thanh
AU - Lo, Tsai Chih
AU - Hsu, Min Huei
AU - Hsieh, Min Shu
AU - Le, Nguyen Quoc Khanh
AU - Cheng, Chi Tsun
AU - Chang, Tzu Hao
AU - Chen, Cheng Yu
N1 - Funding Information:
This study was supported by Taiwan Ministry of Science and Technology grants (grant numbers: MOST109-2321-B-038-004; MOST110-2321-B-038-004). The funders had no role in the study design, data collection and analysis, publication decision, or manuscript preparation.
Publisher Copyright:
© 2022 by the authors.
PY - 2022/11
Y1 - 2022/11
N2 - A well-established lung-cancer-survival-prediction model that relies on multiple data types, multiple novel machine-learning algorithms, and external testing is absent in the literature. This study aims to address this gap and determine the critical factors of lung cancer survival. We selected non-small-cell lung cancer patients from a retrospective dataset of the Taipei Medical University Clinical Research Database and Taiwan Cancer Registry between January 2008 and December 2018. All patients were monitored from the index date of cancer diagnosis until the event of death. Variables, including demographics, comorbidities, medications, laboratories, and patient gene tests, were used. Nine machine-learning algorithms with various modes were used. The performance of the algorithms was measured by the area under the receiver operating characteristic curve (AUC). In total, 3714 patients were included. The best performance of the artificial neural network (ANN) model was achieved when integrating all variables with the AUC, accuracy, precision, recall, and F1-score of 0.89, 0.82, 0.91, 0.75, and 0.65, respectively. The most important features were cancer stage, cancer size, age of diagnosis, smoking, drinking status, EGFR gene, and body mass index. Overall, the ANN model improved predictive performance when integrating different data types.
AB - A well-established lung-cancer-survival-prediction model that relies on multiple data types, multiple novel machine-learning algorithms, and external testing is absent in the literature. This study aims to address this gap and determine the critical factors of lung cancer survival. We selected non-small-cell lung cancer patients from a retrospective dataset of the Taipei Medical University Clinical Research Database and Taiwan Cancer Registry between January 2008 and December 2018. All patients were monitored from the index date of cancer diagnosis until the event of death. Variables, including demographics, comorbidities, medications, laboratories, and patient gene tests, were used. Nine machine-learning algorithms with various modes were used. The performance of the algorithms was measured by the area under the receiver operating characteristic curve (AUC). In total, 3714 patients were included. The best performance of the artificial neural network (ANN) model was achieved when integrating all variables with the AUC, accuracy, precision, recall, and F1-score of 0.89, 0.82, 0.91, 0.75, and 0.65, respectively. The most important features were cancer stage, cancer size, age of diagnosis, smoking, drinking status, EGFR gene, and body mass index. Overall, the ANN model improved predictive performance when integrating different data types.
KW - artificial intelligence
KW - lung cancer
KW - machine learning
KW - prediction models
KW - real-world data
KW - survival
UR - http://www.scopus.com/inward/record.url?scp=85142493018&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85142493018&partnerID=8YFLogxK
U2 - 10.3390/cancers14225562
DO - 10.3390/cancers14225562
M3 - Article
AN - SCOPUS:85142493018
SN - 2072-6694
VL - 14
JO - Cancers
JF - Cancers
IS - 22
M1 - 5562
ER -