Enhancing Arabidopsis thaliana ubiquitination site prediction through knowledge distillation and natural language processing

Van Nui Nguyen, Thi Xuan Tran, Thi Tuyen Nguyen, Nguyen Quoc Khanh Le

研究成果: 雜誌貢獻文章同行評審

摘要

Protein ubiquitination is a critical post-translational modification (PTM) involved in diverse biological processes and plays a pivotal role in regulating physiological mechanisms and disease states. Despite various efforts to develop ubiquitination site prediction tools across species, these tools mainly rely on predefined sequence features and machine learning algorithms, with species-specific variations in ubiquitination patterns remaining poorly understood. This study introduces a novel approach for predicting Arabidopsis thaliana ubiquitination sites using a neural network model based on knowledge distillation and natural language processing (NLP) of protein sequences. Our framework employs a multi-species “Teacher model” to guide a more compact, species-specific “Student model”, with the “Teacher” generating pseudo-labels that enhance the “Student” learning and prediction robustness. Cross-validation results demonstrate that our model achieves superior performance, with an accuracy of 86.3 % and an area under the curve (AUC) of 0.926, while independent testing confirmed these results with an accuracy of 86.3 % and an AUC of 0.923. Comparative analysis with established predictors further highlights the model's superiority, emphasizing the effectiveness of integrating knowledge distillation and NLP in ubiquitination prediction tasks. This study presents a promising and efficient approach for ubiquitination site prediction, offering valuable insights for researchers in related fields. The code and resources are available on GitHub: https://github.com/nuinvtnu/KD_ArapUbi.
原文英語
頁(從 - 到)65-71
頁數7
期刊Methods
232
DOIs
出版狀態已發佈 - 12月 2024

ASJC Scopus subject areas

  • 分子生物學
  • 一般生物化學,遺傳學和分子生物學

指紋

深入研究「Enhancing Arabidopsis thaliana ubiquitination site prediction through knowledge distillation and natural language processing」主題。共同形成了獨特的指紋。

引用此