Comparison between linear regression and four different machine learning methods in selecting risk factors for osteoporosis in a Chinese female aged cohort

Shiow-Jyu Tzou, Chung-Hsin Peng, Li-Ying Huang, Fang-Yu Chen, Chun-Heng Kuo, Chung-Ze Wu, Ta-Wei Chu

研究成果: 雜誌貢獻文章同行評審

2 引文 斯高帕斯(Scopus)

摘要

BACKGROUND: Population aging is emerging as an increasingly acute challenge for countries around the world. One particular manifestation of this phenomenon is the impact of osteoporosis on individuals and national health systems. Previous studies of risk factors for osteoporosis were conducted using traditional statistical methods, but more recent efforts have turned to machine learning approaches. Most such efforts, however, treat the target variable (bone mineral density [BMD] or fracture rate) as a categorical one, which provides no quantitative information. The present study uses five different machine learning methods to analyze the risk factors for T-score of BMD, seeking to (1) compare the prediction accuracy between different machine learning methods and traditional multiple linear regression (MLR) and (2) rank the importance of 25 different risk factors.

METHODS: The study sample includes 24 412 women older than 55 years with 25 related variables, applying traditional MLR and five different machine learning methods: classification and regression tree, Naïve Bayes, random forest, stochastic gradient boosting, and eXtreme gradient boosting. The metrics used for model performance comparisons are the symmetric mean absolute percentage error, relative absolute error, root relative squared error, and root mean squared error.

RESULTS: Machine learning approaches outperformed MLR for all four prediction errors. The average importance ranking of each factor generated by the machine learning methods indicates that age is the most important factor determining T-score, followed by estimated glomerular filtration rate (eGFR), body mass index (BMI), uric acid (UA), and education level.

CONCLUSION: In a group of women older than 55 years, we demonstrated that machine learning methods provide superior performance in estimating T-Score, with age being the most important impact factor, followed by eGFR, BMI, UA, and education level.
原文英語
頁(從 - 到)1028-1036
頁數9
期刊Journal of the Chinese Medical Association : JCMA
86
發行號11
DOIs
出版狀態已發佈 - 11月 1 2023

指紋

深入研究「Comparison between linear regression and four different machine learning methods in selecting risk factors for osteoporosis in a Chinese female aged cohort」主題。共同形成了獨特的指紋。

引用此