TY - JOUR
T1 - Fertility-GRU
T2 - Identifying Fertility-Related Proteins by Incorporating Deep-Gated Recurrent Units and Original Position-Specific Scoring Matrix Profiles
AU - Le, Nguyen Quoc Khanh
N1 - Publisher Copyright:
© 2019 American Chemical Society.
PY - 2019/1/1
Y1 - 2019/1/1
N2 - Protein function prediction is one of the well-known problems in proteome research, attracting the attention of numerous researchers. However, the implementation of deep neural networks, which helps to increase the protein function prediction, still poses a big challenge. This study proposes a deep learning approach namely Fertility-GRU that incorporates gated recurrent units and position-specific scoring matrix profiles to predict the function of fertility-related protein, which is a highly crucial biological function. Fertility-related proteins also have been proven to be important in many biological entities (i.e., bone marrow and peripheral blood, postnatal mammalian ovary) and parameters (i.e., daily sperm production). As a result, our model can achieve a cross-validation accuracy of 85.8% and an independent accuracy of 91.1%. We also solve the problem of overfitting in the data set by adding dropout layers in the deep learning model. The independent testing results showed sensitivity, specificity, and Matthews correlation coefficient (MCC) values of 90.5%, 91.7%, and 0.82, respectively. Fertility-GRU demonstrates superiority in performance against the state-of-the-art predictor on the same data set. In our proposed study, we provided a method that enables more proteins to be discovered, especially proteins associated with fertility. Moreover, our achievement could promote the use of recurrent networks and gated recurrent units in proteome research. The source code and data set are freely accessible via https://github.com/khanhlee/fertility-gru.
AB - Protein function prediction is one of the well-known problems in proteome research, attracting the attention of numerous researchers. However, the implementation of deep neural networks, which helps to increase the protein function prediction, still poses a big challenge. This study proposes a deep learning approach namely Fertility-GRU that incorporates gated recurrent units and position-specific scoring matrix profiles to predict the function of fertility-related protein, which is a highly crucial biological function. Fertility-related proteins also have been proven to be important in many biological entities (i.e., bone marrow and peripheral blood, postnatal mammalian ovary) and parameters (i.e., daily sperm production). As a result, our model can achieve a cross-validation accuracy of 85.8% and an independent accuracy of 91.1%. We also solve the problem of overfitting in the data set by adding dropout layers in the deep learning model. The independent testing results showed sensitivity, specificity, and Matthews correlation coefficient (MCC) values of 90.5%, 91.7%, and 0.82, respectively. Fertility-GRU demonstrates superiority in performance against the state-of-the-art predictor on the same data set. In our proposed study, we provided a method that enables more proteins to be discovered, especially proteins associated with fertility. Moreover, our achievement could promote the use of recurrent networks and gated recurrent units in proteome research. The source code and data set are freely accessible via https://github.com/khanhlee/fertility-gru.
KW - deep learning
KW - embryogenesis
KW - infertility
KW - oogenesis process
KW - position-specific scoring matrix
KW - protein function prediction
KW - recurrent neural network
KW - reproductive physiology
KW - sperm metabolism
KW - spermatogenesis
UR - http://www.scopus.com/inward/record.url?scp=85071174376&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85071174376&partnerID=8YFLogxK
U2 - 10.1021/acs.jproteome.9b00411
DO - 10.1021/acs.jproteome.9b00411
M3 - Article
C2 - 31362508
AN - SCOPUS:85071174376
SN - 1535-3893
VL - 18
SP - 3503
EP - 3511
JO - Journal of Proteome Research
JF - Journal of Proteome Research
IS - 9
ER -