TY - JOUR
T1 - Comparison between linear regression and four different machine learning methods in selecting risk factors for osteoporosis in a Chinese female aged cohort
AU - Tzou, Shiow-Jyu
AU - Peng, Chung-Hsin
AU - Huang, Li-Ying
AU - Chen, Fang-Yu
AU - Kuo, Chun-Heng
AU - Wu, Chung-Ze
AU - Chu, Ta-Wei
N1 - Copyright © 2023, the Chinese Medical Association.
PY - 2023/11/1
Y1 - 2023/11/1
N2 - BACKGROUND: Population aging is emerging as an increasingly acute challenge for countries around the world. One particular manifestation of this phenomenon is the impact of osteoporosis on individuals and national health systems. Previous studies of risk factors for osteoporosis were conducted using traditional statistical methods, but more recent efforts have turned to machine learning approaches. Most such efforts, however, treat the target variable (bone mineral density [BMD] or fracture rate) as a categorical one, which provides no quantitative information. The present study uses five different machine learning methods to analyze the risk factors for T-score of BMD, seeking to (1) compare the prediction accuracy between different machine learning methods and traditional multiple linear regression (MLR) and (2) rank the importance of 25 different risk factors.METHODS: The study sample includes 24 412 women older than 55 years with 25 related variables, applying traditional MLR and five different machine learning methods: classification and regression tree, Naïve Bayes, random forest, stochastic gradient boosting, and eXtreme gradient boosting. The metrics used for model performance comparisons are the symmetric mean absolute percentage error, relative absolute error, root relative squared error, and root mean squared error.RESULTS: Machine learning approaches outperformed MLR for all four prediction errors. The average importance ranking of each factor generated by the machine learning methods indicates that age is the most important factor determining T-score, followed by estimated glomerular filtration rate (eGFR), body mass index (BMI), uric acid (UA), and education level.CONCLUSION: In a group of women older than 55 years, we demonstrated that machine learning methods provide superior performance in estimating T-Score, with age being the most important impact factor, followed by eGFR, BMI, UA, and education level.
AB - BACKGROUND: Population aging is emerging as an increasingly acute challenge for countries around the world. One particular manifestation of this phenomenon is the impact of osteoporosis on individuals and national health systems. Previous studies of risk factors for osteoporosis were conducted using traditional statistical methods, but more recent efforts have turned to machine learning approaches. Most such efforts, however, treat the target variable (bone mineral density [BMD] or fracture rate) as a categorical one, which provides no quantitative information. The present study uses five different machine learning methods to analyze the risk factors for T-score of BMD, seeking to (1) compare the prediction accuracy between different machine learning methods and traditional multiple linear regression (MLR) and (2) rank the importance of 25 different risk factors.METHODS: The study sample includes 24 412 women older than 55 years with 25 related variables, applying traditional MLR and five different machine learning methods: classification and regression tree, Naïve Bayes, random forest, stochastic gradient boosting, and eXtreme gradient boosting. The metrics used for model performance comparisons are the symmetric mean absolute percentage error, relative absolute error, root relative squared error, and root mean squared error.RESULTS: Machine learning approaches outperformed MLR for all four prediction errors. The average importance ranking of each factor generated by the machine learning methods indicates that age is the most important factor determining T-score, followed by estimated glomerular filtration rate (eGFR), body mass index (BMI), uric acid (UA), and education level.CONCLUSION: In a group of women older than 55 years, we demonstrated that machine learning methods provide superior performance in estimating T-Score, with age being the most important impact factor, followed by eGFR, BMI, UA, and education level.
KW - Female
KW - Humans
KW - Bayes Theorem
KW - East Asian People/statistics & numerical data
KW - Linear Models
KW - Machine Learning
KW - Osteoporosis/epidemiology
KW - Risk Factors
KW - Middle Aged
KW - Risk Assessment/methods
KW - Taiwan/epidemiology
U2 - 10.1097/JCMA.0000000000000999
DO - 10.1097/JCMA.0000000000000999
M3 - Article
C2 - 37729604
SN - 1726-4901
VL - 86
SP - 1028
EP - 1036
JO - Journal of the Chinese Medical Association : JCMA
JF - Journal of the Chinese Medical Association : JCMA
IS - 11
ER -