TY - GEN
T1 - A Sequence-Based Prediction Model of Vesicular Transport Proteins Using Ensemble Deep Learning
AU - Le, Nguyen Quoc Khanh
AU - Kha, Quang Hien
N1 - Publisher Copyright:
© 2023 ACM.
PY - 2023/9/3
Y1 - 2023/9/3
N2 - This study aims to employ computational methods for the accurate identification of vesicular transport proteins. The identification of these proteins holds great significance in enhancing our understanding of their protein family structure, thereby enabling the design of more effective drug targets for individuals afflicted with endocrine disorders. In recent times, researchers in the field of biology have increasingly sought to leverage deep learning techniques to address this challenge. In order to further enhance the classification performance, we investigated the following models incorporating distinct features: (1) We devised a novel protein feature called AAC_PSSM by amalgamating amino acid composition (AAC) and position-specific scoring matrix (PSSM) features. Subsequently, a gated recurrent unit (GRU) model was employed to learn such features; (2) An ensemble model was constructed by combining the existing GRU model with the model of a neural network featuring the AAC feature; (3) Random forest analysis was conducted using the pseudo-amino acid composition (PseAAC) feature; (4) Furthermore, we explored a natural language processing (NLP) approach by considering the protein sequence as a natural language and applying various neural network architectures. Upon analyzing the results obtained from the different models, it was observed that the ensemble model incorporating PSSM and AAC features exhibited the highest sensitivity of 81.03% and accuracy of 82.43%. Notably, our proposed model surpassed the performance of state-of-the-art models addressing the same problem and datasets, thus establishing its superiority.
AB - This study aims to employ computational methods for the accurate identification of vesicular transport proteins. The identification of these proteins holds great significance in enhancing our understanding of their protein family structure, thereby enabling the design of more effective drug targets for individuals afflicted with endocrine disorders. In recent times, researchers in the field of biology have increasingly sought to leverage deep learning techniques to address this challenge. In order to further enhance the classification performance, we investigated the following models incorporating distinct features: (1) We devised a novel protein feature called AAC_PSSM by amalgamating amino acid composition (AAC) and position-specific scoring matrix (PSSM) features. Subsequently, a gated recurrent unit (GRU) model was employed to learn such features; (2) An ensemble model was constructed by combining the existing GRU model with the model of a neural network featuring the AAC feature; (3) Random forest analysis was conducted using the pseudo-amino acid composition (PseAAC) feature; (4) Furthermore, we explored a natural language processing (NLP) approach by considering the protein sequence as a natural language and applying various neural network architectures. Upon analyzing the results obtained from the different models, it was observed that the ensemble model incorporating PSSM and AAC features exhibited the highest sensitivity of 81.03% and accuracy of 82.43%. Notably, our proposed model surpassed the performance of state-of-the-art models addressing the same problem and datasets, thus establishing its superiority.
KW - deep learning
KW - gate recurrent unit
KW - nesemble learning
KW - position-specific scoring matrix
KW - protein sequence
KW - vesicular transport
UR - http://www.scopus.com/inward/record.url?scp=85175835000&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85175835000&partnerID=8YFLogxK
U2 - 10.1145/3584371.3612950
DO - 10.1145/3584371.3612950
M3 - Conference contribution
AN - SCOPUS:85175835000
T3 - ACM-BCB 2023 - 14th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics
BT - ACM-BCB 2023 - 14th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics
PB - Association for Computing Machinery, Inc
T2 - 14th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, ACM-BCB 2023
Y2 - 3 September 2023 through 6 September 2023
ER -