Integrating CNN and Bi-LSTM for protein succinylation sites prediction based on Natural Language Processing technique

Thi Xuan Tran, Nguyen Quoc Khanh Le, Van Nui Nguyen

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

Protein succinylation, a post-translational modification wherein a succinyl group (-CO-CH₂-CH₂-CO-) attaches to lysine residues, plays a critical regulatory role in cellular processes. Dysregulated succinylation has been implicated in the onset and progression of various diseases, including liver, cardiac, pulmonary, and neurological disorders. However, identifying succinylation sites through experimental methods is often labor-intensive, costly, and technically challenging. To address this, we introduce an approach called CbiLSuccSite, that integrates Convolutional Neural Networks (CNN) with Bidirectional Long Short-Term Memory (Bi-LSTM) networks for the accurate prediction of protein succinylation sites. Our approach employs a word embedding layer to encode protein sequences, enabling the automatic learning of intricate patterns and dependencies without manual feature extraction. In 10-fold cross-validation, CBiLSuccSite achieved superior predictive performance, with an Area Under the Curve (AUC) of 0.826 and a Matthews Correlation Coefficient (MCC) of 0.502. Independent testing further validated its robustness, yielding an AUC of 0.818 and an MCC of 0.53. The integration of CNN and Bi-LSTM leverages the strengths of both architectures, establishing CBiLSuccSite as an effective tool for protein language processing and succinylation site prediction. Our model and code are publicly accessible at: https://github.com/nuinvtnu/CBiLSuccSite.

Original languageEnglish
Article number109664
JournalComputers in Biology and Medicine
Volume186
DOIs
Publication statusPublished - Mar 2025

Keywords

  • Bi-direction long short-term memory (Bi-LSTM)
  • Convolutional Neural Network (CNN)
  • Natural Language Processing (NLP)
  • Succinylation
  • Word embedding

ASJC Scopus subject areas

  • Health Informatics
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Integrating CNN and Bi-LSTM for protein succinylation sites prediction based on Natural Language Processing technique'. Together they form a unique fingerprint.

Cite this