TY - JOUR
T1 - Machine learning model for identifying antioxidant proteins using features calculated from primary sequences
AU - Lam, Luu Ho Thanh
AU - Le, Ngoc Hoang
AU - Van Tuan, Le
AU - Ban, Ho Tran
AU - Hung, Truong Nguyen Khanh
AU - Nguyen, Ngan Thi Kim
AU - Dang, Luong Huu
AU - Le, Nguyen Quoc Khanh
N1 - Funding Information:
Funding: This research was funded by the Research Grant for Newly Hired Faculty, Taipei Medical University (TMU), grant number TMU108-AE1-B26 and Higher Education Sprout Project, Ministry of Education (MOE), Taiwan, grant number DP2-109-21121-01-A-06. The APC was funded by DP2-109-21121-01-A-06.
Publisher Copyright:
© 2020 by the authors. Licensee MDPI, Basel, Switzerland.
Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2020/10
Y1 - 2020/10
N2 - Antioxidant proteins are involved importantly in many aspects of cellular life activities. They protect the cell and DNA from oxidative substances (such as peroxide, nitric oxide, oxygen-free radicals, etc.) which are known as reactive oxygen species (ROS). Free radical generation and antioxidant defenses are opposing factors in the human body and the balance between them is necessary to maintain a healthy body. An unhealthy routine or the degeneration of age can break the balance, leading to more ROS than antioxidants, causing damage to health. In general, the antioxidant mechanism is the combination of antioxidant molecules and ROS in a one-electron reaction. Creating computational models to promptly identify antioxidant candidates is essential in supporting antioxidant detection experiments in the laboratory. In this study, we proposed a machine learning-based model for this prediction purpose from a benchmark set of sequencing data. The experiments were conducted by using 10-fold cross-validation on the training process and validated by three different independent datasets. Different machine learning and deep learning algorithms have been evaluated on an optimal set of sequence features. Among them, Random Forest has been identified as the best model to identify antioxidant proteins with the highest performance. Our optimal model achieved high accuracy of 84.6%, as well as a balance in sensitivity (81.5%) and specificity (85.1%) for antioxidant protein identification on the training dataset. The performance results from different independent datasets also showed the significance in our model compared to previously published works on antioxidant protein identification.
AB - Antioxidant proteins are involved importantly in many aspects of cellular life activities. They protect the cell and DNA from oxidative substances (such as peroxide, nitric oxide, oxygen-free radicals, etc.) which are known as reactive oxygen species (ROS). Free radical generation and antioxidant defenses are opposing factors in the human body and the balance between them is necessary to maintain a healthy body. An unhealthy routine or the degeneration of age can break the balance, leading to more ROS than antioxidants, causing damage to health. In general, the antioxidant mechanism is the combination of antioxidant molecules and ROS in a one-electron reaction. Creating computational models to promptly identify antioxidant candidates is essential in supporting antioxidant detection experiments in the laboratory. In this study, we proposed a machine learning-based model for this prediction purpose from a benchmark set of sequencing data. The experiments were conducted by using 10-fold cross-validation on the training process and validated by three different independent datasets. Different machine learning and deep learning algorithms have been evaluated on an optimal set of sequence features. Among them, Random Forest has been identified as the best model to identify antioxidant proteins with the highest performance. Our optimal model achieved high accuracy of 84.6%, as well as a balance in sensitivity (81.5%) and specificity (85.1%) for antioxidant protein identification on the training dataset. The performance results from different independent datasets also showed the significance in our model compared to previously published works on antioxidant protein identification.
KW - Antioxidant proteins
KW - Computational modeling
KW - Feature selection
KW - Machine learning
KW - Protein sequencing
KW - Random Forest
UR - http://www.scopus.com/inward/record.url?scp=85092368682&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85092368682&partnerID=8YFLogxK
U2 - 10.3390/biology9100325
DO - 10.3390/biology9100325
M3 - Article
AN - SCOPUS:85092368682
SN - 2079-7737
VL - 9
SP - 1
EP - 13
JO - Biology
JF - Biology
IS - 10
M1 - 325
ER -