TY - JOUR
T1 - Determinants of coronavirus disease 2019 infection by artificial intelligence technology
T2 - A study of 28 countries
AU - Peng, Hsiao Ya
AU - Lin, Yen Kuang
AU - Nguyen, Phung Anh
AU - Hsu, Jason C.
AU - Chou, Chun Liang
AU - Chang, Chih Cheng
AU - Lin, Chia Chi
AU - Lam, Carlos
AU - Chen, Chang I.
AU - Wang, Kai Hsun
AU - Lu, Christine Y.
N1 - Funding Information:
We thank the Imperial College London for publicly sharing Coronavirus Disease 2019-related questionnaire survey data for research purposes.
Publisher Copyright:
© 2022 Peng et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
PY - 2022/8
Y1 - 2022/8
N2 - Objectives The coronavirus disease 2019 pandemic has affected countries around the world since 2020, and an increasing number of people are being infected. The purpose of this research was to use big data and artificial intelligence technology to find key factors associated with the coronavirus disease 2019 infection. The results can be used as a reference for disease prevention in practice. Methods This study obtained data from the "Imperial College London YouGov Covid-19 Behaviour Tracker Open Data Hub", covering a total of 291,780 questionnaire results from 28 countries (April 1~August 31, 2020). Data included basic characteristics, lifestyle habits, disease history, and symptoms of each subject. Four types of machine learning classification models were used, including logistic regression, random forest, support vector machine, and artificial neural network, to build prediction modules. The performance of each module is presented as the area under the receiver operating characteristics curve. Then, this study further processed important factors selected by each module to obtain an overall ranking of determinants. Results This study found that the area under the receiver operating characteristics curve of the prediction modules established by the four machine learning methods were all >0.95, and the RF had the highest performance (area under the receiver operating characteristics curve is 0.988). Top ten factors associated with the coronavirus disease 2019 infection were identified in order of importance: whether the family had been tested, having no symptoms, loss of smell, loss of taste, a history of epilepsy, acquired immune deficiency syndrome, cystic fibrosis, sleeping alone, country, and the number of times leaving home in a day. Conclusions This study used big data from 28 countries and artificial intelligence methods to determine the predictors of the coronavirus disease 2019 infection. The findings provide important insights for the coronavirus disease 2019 infection prevention strategies.
AB - Objectives The coronavirus disease 2019 pandemic has affected countries around the world since 2020, and an increasing number of people are being infected. The purpose of this research was to use big data and artificial intelligence technology to find key factors associated with the coronavirus disease 2019 infection. The results can be used as a reference for disease prevention in practice. Methods This study obtained data from the "Imperial College London YouGov Covid-19 Behaviour Tracker Open Data Hub", covering a total of 291,780 questionnaire results from 28 countries (April 1~August 31, 2020). Data included basic characteristics, lifestyle habits, disease history, and symptoms of each subject. Four types of machine learning classification models were used, including logistic regression, random forest, support vector machine, and artificial neural network, to build prediction modules. The performance of each module is presented as the area under the receiver operating characteristics curve. Then, this study further processed important factors selected by each module to obtain an overall ranking of determinants. Results This study found that the area under the receiver operating characteristics curve of the prediction modules established by the four machine learning methods were all >0.95, and the RF had the highest performance (area under the receiver operating characteristics curve is 0.988). Top ten factors associated with the coronavirus disease 2019 infection were identified in order of importance: whether the family had been tested, having no symptoms, loss of smell, loss of taste, a history of epilepsy, acquired immune deficiency syndrome, cystic fibrosis, sleeping alone, country, and the number of times leaving home in a day. Conclusions This study used big data from 28 countries and artificial intelligence methods to determine the predictors of the coronavirus disease 2019 infection. The findings provide important insights for the coronavirus disease 2019 infection prevention strategies.
UR - http://www.scopus.com/inward/record.url?scp=85137127536&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85137127536&partnerID=8YFLogxK
U2 - 10.1371/journal.pone.0272546
DO - 10.1371/journal.pone.0272546
M3 - Article
C2 - 36018862
AN - SCOPUS:85137127536
SN - 1932-6203
VL - 17
JO - PLoS ONE
JF - PLoS ONE
IS - 8 August
M1 - e0272546
ER -