TY - JOUR
T1 - Improved Prediction Model of Protein Lysine Crotonylation Sites Using Bidirectional Recurrent Neural Networks
AU - Tng, Sian Soo
AU - Le, Nguyen Quoc Khanh
AU - Yeh, Hui Yuan
AU - Chua, Matthew Chin Heng
N1 - Funding Information:
This work was supported by the Ministry of Science and Technology, Taiwan (grant number MOST110-2221-E-038-001-MY2).
Publisher Copyright:
© 2021 American Chemical Society
PY - 2022/1/7
Y1 - 2022/1/7
N2 - Histone lysine crotonylation (Kcr) is a post-translational modification of histone proteins that is involved in the regulation of gene transcription, acute and chronic kidney injury, spermatogenesis, depression, cancer, and so forth. The identification of Kcr sites in proteins is important for characterizing and regulating primary biological mechanisms. The use of computational approaches such as machine learning and deep learning algorithms have emerged in recent years as the traditional wet-lab experiments are time-consuming and costly. We propose as part of this study a deep learning model based on a recurrent neural network (RNN) termed as Sohoko-Kcr for the prediction of Kcr sites. Through the embedded encoding of the peptide sequences, we investigate the efficiency of RNN-based models such as long short-term memory (LSTM), bidirectional LSTM (BiLSTM), and bidirectional gated recurrent unit (BiGRU) networks using cross-validation and independent tests. We also established the comparison between Sohoko-Kcr and other published tools to verify the efficiency of our model based on 3-fold, 5-fold, and 10-fold cross-validations using independent set tests. The results then show that the BiGRU model has consistently displayed outstanding performance and computational efficiency. Based on the proposed model, a webserver called Sohoko-Kcr was deployed for free use and is accessible at https://sohoko-research-9uu23.ondigitalocean.app.
AB - Histone lysine crotonylation (Kcr) is a post-translational modification of histone proteins that is involved in the regulation of gene transcription, acute and chronic kidney injury, spermatogenesis, depression, cancer, and so forth. The identification of Kcr sites in proteins is important for characterizing and regulating primary biological mechanisms. The use of computational approaches such as machine learning and deep learning algorithms have emerged in recent years as the traditional wet-lab experiments are time-consuming and costly. We propose as part of this study a deep learning model based on a recurrent neural network (RNN) termed as Sohoko-Kcr for the prediction of Kcr sites. Through the embedded encoding of the peptide sequences, we investigate the efficiency of RNN-based models such as long short-term memory (LSTM), bidirectional LSTM (BiLSTM), and bidirectional gated recurrent unit (BiGRU) networks using cross-validation and independent tests. We also established the comparison between Sohoko-Kcr and other published tools to verify the efficiency of our model based on 3-fold, 5-fold, and 10-fold cross-validations using independent set tests. The results then show that the BiGRU model has consistently displayed outstanding performance and computational efficiency. Based on the proposed model, a webserver called Sohoko-Kcr was deployed for free use and is accessible at https://sohoko-research-9uu23.ondigitalocean.app.
KW - bidirectional long short-term memory
KW - bioinformatics
KW - computational biology
KW - deep learning
KW - gated recurrent unit
KW - lysine crotonylation pathway
KW - post-translational modifications
KW - protein sequence
KW - recurrent neural network
UR - http://www.scopus.com/inward/record.url?scp=85120434796&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85120434796&partnerID=8YFLogxK
U2 - 10.1021/acs.jproteome.1c00848
DO - 10.1021/acs.jproteome.1c00848
M3 - Article
C2 - 34812044
AN - SCOPUS:85120434796
SN - 1535-3893
VL - 21
SP - 265
EP - 273
JO - Journal of Proteome Research
JF - Journal of Proteome Research
IS - 1
ER -