TY - JOUR
T1 - Improving the use of mortality data in public health
T2 - A comparison of garbage code redistribution models
AU - Ng, Ta Chou
AU - Lo, Wei Cheng
AU - Ku, Chu Chang
AU - Lu, Tsung Hsueh
AU - Lin, Hsien Ho
N1 - Publisher Copyright:
© 2020 American Public Health Association Inc.. All rights reserved.
PY - 2020/1/1
Y1 - 2020/1/1
N2 - Objectives: To describe and compare 3 garbage code (GC) redistribution models: naïve Bayes classifier (NB), coarsened exact matching (CEM), and multinomial logistic regression (MLR). Methods: We analyzed Taiwan Vital Registration data (2008-2016) using a 2-step approach. First, we used non-GC death records to evaluate 3 different prediction models (NB, CEM, and MLR), incorporating individual-level information on multiple causes of death (MCDs) and demographic characteristics. Second, we applied the best-performing model to GC death records to predict the underlying causes of death. We conducted additional simulation analyses for evaluating the predictive performance of models. Results: When we did not account for MCDs, all 3 models presented high average misclassification rates in GC assignment (NB, 81%; CEM, 86%; MLR, 81%). In the presence of MCD information, NB and MLR exhibited significant improvement in assignment accuracy (19% and 17% misclassification rate, respectively). Furthermore, CEM without a variable selection procedure resulted in a substantially higher misclassification rate (40%). Conclusions: Comparing potential GC redistribution approaches provides guidance for obtaining better estimates of cause-of-death distribution and highlights the significance of MCD information for vital registration system reform.
AB - Objectives: To describe and compare 3 garbage code (GC) redistribution models: naïve Bayes classifier (NB), coarsened exact matching (CEM), and multinomial logistic regression (MLR). Methods: We analyzed Taiwan Vital Registration data (2008-2016) using a 2-step approach. First, we used non-GC death records to evaluate 3 different prediction models (NB, CEM, and MLR), incorporating individual-level information on multiple causes of death (MCDs) and demographic characteristics. Second, we applied the best-performing model to GC death records to predict the underlying causes of death. We conducted additional simulation analyses for evaluating the predictive performance of models. Results: When we did not account for MCDs, all 3 models presented high average misclassification rates in GC assignment (NB, 81%; CEM, 86%; MLR, 81%). In the presence of MCD information, NB and MLR exhibited significant improvement in assignment accuracy (19% and 17% misclassification rate, respectively). Furthermore, CEM without a variable selection procedure resulted in a substantially higher misclassification rate (40%). Conclusions: Comparing potential GC redistribution approaches provides guidance for obtaining better estimates of cause-of-death distribution and highlights the significance of MCD information for vital registration system reform.
UR - http://www.scopus.com/inward/record.url?scp=85077761798&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85077761798&partnerID=8YFLogxK
U2 - 10.2105/AJPH.2019.305439
DO - 10.2105/AJPH.2019.305439
M3 - Article
C2 - 31855478
AN - SCOPUS:85077761798
SN - 0090-0036
VL - 110
SP - 222
EP - 229
JO - American Journal of Public Health
JF - American Journal of Public Health
IS - 2
ER -