Incorporating Natural Language-Based and Sequence-Based Features to Predict Protein Sumoylation Sites

Thi Xuan Tran, Van Nui Nguyen, Nguyen Quoc Khanh Le

研究成果: 書貢獻/報告類型會議貢獻

摘要

The incidence of thyroid cancer and breast cancer is increasing every year, and the specific pathogenesis is unclear. Post-translational modifications are an important regulatory mechanism that affects the function of almost all proteins. They are essential for a diverse and well-functioning proteome and can integrate metabolism with physiological and pathological processes. In recent years, post-translational modifications have become a research hotspot, with methylation, phosphorylation, acetylation and succinylation being the main focus. SUMOylated proteins are predominantly localized in the nucleus, and SUMO regulates nuclear processes, including cell cycle control and DNA repair. SUMOylated proteins are predominantly localized in the nucleus, and SUMO regulates nuclear processes, including cell cycle control and DNA repair. SUMOylation has been increasingly implicated in cancer, Alzheimer’s, and Parkinson’s diseases. Therefore, identification and characterization SUMOylation sites are essential for determining modification-specific proteomics. This study aims to propose a novel schema for predicting protein SUMOylation sites based on the incorporation of natural language features (Word2Vec) and sequence-based features. In addition, the novel model, called RSX_SUMO, is proposed for the prediction of protein SUMOylation sites. Our experiments reveal that the performance of RSX_SUMO model achieves the highest performance in both five-fold cross-validation and independent testing, obtain the performance on independent testing with acccuracy at 88.6% and MCC value of 0.743. In addition, the comparison with several existing prediction models show that our proposed model outperforms and obtains the highest performance. We hope that our findings would provide effective suggestions and be a great helpful for researchers related to their related studies.
原文英語
主出版物標題The 12th Conference on Information Technology and Its Applications - Proceedings of the International Conference CITA 2023
編輯Ngoc Thanh Nguyen, Hoa Le-Minh, Cong-Phap Huynh, Quang-Vu Nguyen
發行者Springer Science and Business Media Deutschland GmbH
頁面74-88
頁數15
ISBN(列印)9783031368851
DOIs
出版狀態已發佈 - 2023
事件Proceedings of the12th International Conference on Information Technology and its Applications, CITA 2023 - Danang City, 越南
持續時間: 7月 28 20237月 29 2023

出版系列

名字Lecture Notes in Networks and Systems
734 LNNS
ISSN(列印)2367-3370
ISSN(電子)2367-3389

會議

會議Proceedings of the12th International Conference on Information Technology and its Applications, CITA 2023
國家/地區越南
城市Danang City
期間7/28/237/29/23

ASJC Scopus subject areas

  • 控制與系統工程
  • 訊號處理
  • 電腦網路與通信

指紋

深入研究「Incorporating Natural Language-Based and Sequence-Based Features to Predict Protein Sumoylation Sites」主題。共同形成了獨特的指紋。

引用此