TY - JOUR
T1 - Leveraging Subjective Parameters and Biomarkers in Machine Learning Models
T2 - The Feasibility of lnc-IL7R for Managing Emphysema Progression
AU - Chen, Tzu Tao
AU - Cheng, Tzu Yu
AU - Liu, I. Jung
AU - Ho, Shu Chuan
AU - Lee, Kang Yun
AU - Huang, Huei Tyng
AU - Feng, Po Hao
AU - Chen, Kuan Yuan
AU - Luo, Ching Shan
AU - Tseng, Chien Hua
AU - Chen, Yueh His
AU - Majumdar, Arnab
AU - Tsai, Cheng Yu
AU - Wu, Sheng Ming
N1 - Publisher Copyright:
© 2025 by the authors.
PY - 2025/5
Y1 - 2025/5
N2 - Background/Objectives: Chronic obstructive pulmonary disease (COPD) remains a leading cause of death worldwide, with emphysema progression providing valuable insights into disease development. Clinical assessment approaches, including pulmonary function tests and high-resolution computed tomography, are limited by accessibility constraints and radiation exposure. This study, therefore, proposed an alternative approach by integrating the novel biomarker long non-coding interleukin-7 receptor α-subunit gene (lnc-Il7R), along with other easily accessible clinical and biochemical metrics, into machine learning (ML) models. Methods: This cohort study collected baseline characteristics, COPD Assessment Test (CAT) scores, and biochemical details from the enrolled participants. Associations with emphysema severity, defined by a low attenuation area percentage (LAA%) threshold of 15%, were evaluated using simple and multivariate-adjusted models. The dataset was then split into training and validation (80%) and test (20%) subsets. Five ML models were employed, with the best-performing model being further analyzed for feature importance. Results: The majority of participants were elderly males. Compared to the LAA% <15% group, the LAA% ≥15% group demonstrated a significantly higher body mass index (BMI), poor pulmonary function, and lower expression levels of lnc-Il7R (all p < 0.01). Fold changes in lnc-IL7R were strongly and negatively associated with LAA% (p < 0.01). The random forest (RF) model achieved the highest accuracy and area under the receiver operating characteristic curve (AUROC) across datasets. A feature importance analysis identified lnc-IL7R fold changes as the strongest predictor for emphysema classification (LAA% ≥15%), followed by CAT scores and BMI. Conclusions: Machine learning models incorporated accessible clinical and biochemical markers, particularly the novel biomarker lnc-IL7R, achieving classification accuracy and AUROC exceeding 75% in emphysema assessments. These findings offer promising opportunities for improving emphysema classification and COPD management.
AB - Background/Objectives: Chronic obstructive pulmonary disease (COPD) remains a leading cause of death worldwide, with emphysema progression providing valuable insights into disease development. Clinical assessment approaches, including pulmonary function tests and high-resolution computed tomography, are limited by accessibility constraints and radiation exposure. This study, therefore, proposed an alternative approach by integrating the novel biomarker long non-coding interleukin-7 receptor α-subunit gene (lnc-Il7R), along with other easily accessible clinical and biochemical metrics, into machine learning (ML) models. Methods: This cohort study collected baseline characteristics, COPD Assessment Test (CAT) scores, and biochemical details from the enrolled participants. Associations with emphysema severity, defined by a low attenuation area percentage (LAA%) threshold of 15%, were evaluated using simple and multivariate-adjusted models. The dataset was then split into training and validation (80%) and test (20%) subsets. Five ML models were employed, with the best-performing model being further analyzed for feature importance. Results: The majority of participants were elderly males. Compared to the LAA% <15% group, the LAA% ≥15% group demonstrated a significantly higher body mass index (BMI), poor pulmonary function, and lower expression levels of lnc-Il7R (all p < 0.01). Fold changes in lnc-IL7R were strongly and negatively associated with LAA% (p < 0.01). The random forest (RF) model achieved the highest accuracy and area under the receiver operating characteristic curve (AUROC) across datasets. A feature importance analysis identified lnc-IL7R fold changes as the strongest predictor for emphysema classification (LAA% ≥15%), followed by CAT scores and BMI. Conclusions: Machine learning models incorporated accessible clinical and biochemical markers, particularly the novel biomarker lnc-IL7R, achieving classification accuracy and AUROC exceeding 75% in emphysema assessments. These findings offer promising opportunities for improving emphysema classification and COPD management.
KW - chronic obstructive pulmonary disease (COPD)
KW - emphysema
KW - long non-coding interleukin-7 receptor α-subunit gene (lnc-IL7R)
KW - machine learning
KW - percentage of low attenuation area (LAA%)
UR - https://www.scopus.com/pages/publications/105004852407
UR - https://www.scopus.com/pages/publications/105004852407#tab=citedBy
U2 - 10.3390/diagnostics15091165
DO - 10.3390/diagnostics15091165
M3 - Article
AN - SCOPUS:105004852407
SN - 2075-4418
VL - 15
JO - Diagnostics
JF - Diagnostics
IS - 9
M1 - 1165
ER -