TY - JOUR
T1 - Machine Learning in Prediction of Bladder Cancer on Clinical Laboratory Data
AU - Tsai, I. Jung
AU - Shen, Wen Chi
AU - Lee, Chia Ling
AU - Wang, Horng Dar
AU - Lin, Ching Yu
N1 - Publisher Copyright:
© 2022 by the authors. Licensee MDPI, Basel, Switzerland.
PY - 2022/1
Y1 - 2022/1
N2 - Bladder cancer has been increasing globally. Urinary cytology is considered a major screening method for bladder cancer, but it has poor sensitivity. This study aimed to utilize clinical laboratory data and machine learning methods to build predictive models of bladder cancer. A total of 1336 patients with cystitis, bladder cancer, kidney cancer, uterus cancer, and prostate cancer were enrolled in this study. Two-step feature selection combined with WEKA and forward selection was performed. Furthermore, five machine learning models, including decision tree, random forest, support vector machine, extreme gradient boosting (XGBoost), and light gradient boosting machine (GBM) were applied. Features, including calcium, alkaline phosphatase (ALP), albumin, urine ketone, urine occult blood, creatinine, alanine aminotransferase (ALT), and diabetes were selected. The lightGBM model obtained an accuracy of 84.8% to 86.9%, a sensitivity 84% to 87.8%, a specificity of 82.9% to 86.7%, and an area under the curve (AUC) of 0.88 to 0.92 in discriminating bladder cancer from cystitis and other cancers. Our study provides a demonstration of utilizing clinical laboratory data to predict bladder cancer.
AB - Bladder cancer has been increasing globally. Urinary cytology is considered a major screening method for bladder cancer, but it has poor sensitivity. This study aimed to utilize clinical laboratory data and machine learning methods to build predictive models of bladder cancer. A total of 1336 patients with cystitis, bladder cancer, kidney cancer, uterus cancer, and prostate cancer were enrolled in this study. Two-step feature selection combined with WEKA and forward selection was performed. Furthermore, five machine learning models, including decision tree, random forest, support vector machine, extreme gradient boosting (XGBoost), and light gradient boosting machine (GBM) were applied. Features, including calcium, alkaline phosphatase (ALP), albumin, urine ketone, urine occult blood, creatinine, alanine aminotransferase (ALT), and diabetes were selected. The lightGBM model obtained an accuracy of 84.8% to 86.9%, a sensitivity 84% to 87.8%, a specificity of 82.9% to 86.7%, and an area under the curve (AUC) of 0.88 to 0.92 in discriminating bladder cancer from cystitis and other cancers. Our study provides a demonstration of utilizing clinical laboratory data to predict bladder cancer.
KW - Bladder cancer
KW - Clinical laboratory data
KW - Feature selection
KW - Machine learning
UR - http://www.scopus.com/inward/record.url?scp=85123116040&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85123116040&partnerID=8YFLogxK
U2 - 10.3390/diagnostics12010203
DO - 10.3390/diagnostics12010203
M3 - Article
AN - SCOPUS:85123116040
SN - 2075-4418
VL - 12
JO - Diagnostics
JF - Diagnostics
IS - 1
M1 - 203
ER -