TY - JOUR
T1 - A vision transformer-convolutional neural network framework for decision-transparent dual-energy X-ray absorptiometry recommendations using chest low-dose CT
AU - Kuo, Duen Pang
AU - Chen, Yung Chieh
AU - Cheng, Sho Jen
AU - Hsieh, Kevin Li Chun
AU - Li, Yi Tien
AU - Kuo, Po Chih
AU - Chang, Yung Chun
AU - Chen, Cheng Yu
N1 - Publisher Copyright:
© 2025 Elsevier B.V.
PY - 2025/7
Y1 - 2025/7
N2 - Objective: This study introduces an ensemble framework that integrates Vision Transformer (ViT) and Convolutional Neural Networks (CNN) models to leverage their complementary strengths, generating visualized and decision-transparent recommendations for dual-energy X-ray absorptiometry (DXA) scans from chest low-dose computed tomography (LDCT). Methods: The framework was developed using data from 321 individuals and validated with an independent test cohort of 186 individuals. It addresses two classification tasks: (1) distinguishing normal from abnormal bone mineral density (BMD) and (2) differentiating osteoporosis from non-osteoporosis. Three field-of-view (FOV) settings—fitFOV (entire vertebra), halfFOV (vertebral body only), and largeFOV (fitFOV + 20 %)—were analyzed to assess their impact on model performance. Model predictions were weighted and combined to enhance classification accuracy, and visualizations were generated to improve decision transparency. DXA scans were recommended for individuals classified as having abnormal BMD or osteoporosis. Results: The ensemble framework significantly outperformed individual models in both classification tasks (McNemar test, p < 0.001). In the development cohort, it achieved 91.6 % accuracy for task 1 with largeFOV (area under the receiver operating characteristic curve [AUROC]: 0.97) and 86.0 % accuracy for task 2 with fitFOV (AUROC: 0.94). In the test cohort, it demonstrated 86.6 % accuracy for task 1 (AUROC: 0.93) and 76.9 % accuracy for task 2 (AUROC: 0.99). DXA recommendation accuracy was 91.6 % and 87.1 % in the development and test cohorts, respectively, with notably high accuracy for osteoporosis detection (98.7 % and 100 %). Conclusions: This combined ViT–CNN framework effectively assesses bone status from LDCT images, particularly when utilizing fitFOV and largeFOV settings. By visualizing classification confidence and vertebral abnormalities, the proposed framework enhances decision transparency and supports clinicians in making informed DXA recommendations following opportunistic osteoporosis screening.
AB - Objective: This study introduces an ensemble framework that integrates Vision Transformer (ViT) and Convolutional Neural Networks (CNN) models to leverage their complementary strengths, generating visualized and decision-transparent recommendations for dual-energy X-ray absorptiometry (DXA) scans from chest low-dose computed tomography (LDCT). Methods: The framework was developed using data from 321 individuals and validated with an independent test cohort of 186 individuals. It addresses two classification tasks: (1) distinguishing normal from abnormal bone mineral density (BMD) and (2) differentiating osteoporosis from non-osteoporosis. Three field-of-view (FOV) settings—fitFOV (entire vertebra), halfFOV (vertebral body only), and largeFOV (fitFOV + 20 %)—were analyzed to assess their impact on model performance. Model predictions were weighted and combined to enhance classification accuracy, and visualizations were generated to improve decision transparency. DXA scans were recommended for individuals classified as having abnormal BMD or osteoporosis. Results: The ensemble framework significantly outperformed individual models in both classification tasks (McNemar test, p < 0.001). In the development cohort, it achieved 91.6 % accuracy for task 1 with largeFOV (area under the receiver operating characteristic curve [AUROC]: 0.97) and 86.0 % accuracy for task 2 with fitFOV (AUROC: 0.94). In the test cohort, it demonstrated 86.6 % accuracy for task 1 (AUROC: 0.93) and 76.9 % accuracy for task 2 (AUROC: 0.99). DXA recommendation accuracy was 91.6 % and 87.1 % in the development and test cohorts, respectively, with notably high accuracy for osteoporosis detection (98.7 % and 100 %). Conclusions: This combined ViT–CNN framework effectively assesses bone status from LDCT images, particularly when utilizing fitFOV and largeFOV settings. By visualizing classification confidence and vertebral abnormalities, the proposed framework enhances decision transparency and supports clinicians in making informed DXA recommendations following opportunistic osteoporosis screening.
KW - Bone mineral density
KW - Convolutional Neural Network
KW - dual-energy X-ray absorptiometry
KW - Low-dose computed tomography
KW - Vision Transformer
UR - http://www.scopus.com/inward/record.url?scp=105001816923&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=105001816923&partnerID=8YFLogxK
U2 - 10.1016/j.ijmedinf.2025.105901
DO - 10.1016/j.ijmedinf.2025.105901
M3 - Article
AN - SCOPUS:105001816923
SN - 1386-5056
VL - 199
JO - International Journal of Medical Informatics
JF - International Journal of Medical Informatics
M1 - 105901
ER -