TY - JOUR
T1 - Establishing a survival probability prediction model for different lung cancer therapies
AU - Lee, Hsiu An
AU - Rau, Hsiao Hsien
AU - Chao, Louis R.
AU - Hsu, Chien Yeh
N1 - Publisher Copyright:
© 2019, Springer Science+Business Media, LLC, part of Springer Nature.
PY - 2020/8/1
Y1 - 2020/8/1
N2 - Cancer is the leading cause of death in Taiwan, according to the Ministry of Health and Welfare (2017), with cancers of the trachea, bronchus, and lung being the most prevalent. Thus, it is critically important to study this disease. By using Taiwan’s National Health Insurance Research Database (NHIRDB), which covers 99.9% of residents, we are capable of analyzing comorbidities and predicting the outcomes of the clinical therapy. This study focuses on non-small cell lung cancer. We first obtain cancer registration indexes from two million individual patient records in NHIRDB by screening patients of having a clinical diagnosis of ICD C33-34 (trachea, bronchus and lung cancer). Then, we used these cancer registration indexes to find all the therapies and comorbidity of the patients and used them as input parameters to establish a predictive model of survival probability for lung cancer. Linear and nonlinear data mining methods were employed to build prediction models to study the effects of different therapies on the 3-year survival probability of lung cancer patients. We found that the artificial neural network (ANN) model performs better than the logistic regression (LR) model. It comes out that the best point of the ANN model on the ROC curve is at sensitivity = 77.6%, specificity = 76.8% and AUROC = 83%.
AB - Cancer is the leading cause of death in Taiwan, according to the Ministry of Health and Welfare (2017), with cancers of the trachea, bronchus, and lung being the most prevalent. Thus, it is critically important to study this disease. By using Taiwan’s National Health Insurance Research Database (NHIRDB), which covers 99.9% of residents, we are capable of analyzing comorbidities and predicting the outcomes of the clinical therapy. This study focuses on non-small cell lung cancer. We first obtain cancer registration indexes from two million individual patient records in NHIRDB by screening patients of having a clinical diagnosis of ICD C33-34 (trachea, bronchus and lung cancer). Then, we used these cancer registration indexes to find all the therapies and comorbidity of the patients and used them as input parameters to establish a predictive model of survival probability for lung cancer. Linear and nonlinear data mining methods were employed to build prediction models to study the effects of different therapies on the 3-year survival probability of lung cancer patients. We found that the artificial neural network (ANN) model performs better than the logistic regression (LR) model. It comes out that the best point of the ANN model on the ROC curve is at sensitivity = 77.6%, specificity = 76.8% and AUROC = 83%.
KW - Data analysis
KW - Lung cancer survival predict
KW - Prediction model
UR - http://www.scopus.com/inward/record.url?scp=85073829565&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85073829565&partnerID=8YFLogxK
U2 - 10.1007/s11227-019-02992-6
DO - 10.1007/s11227-019-02992-6
M3 - Article
AN - SCOPUS:85073829565
SN - 0920-8542
VL - 76
SP - 6501
EP - 6514
JO - Journal of Supercomputing
JF - Journal of Supercomputing
IS - 8
ER -