TY - JOUR
T1 - An interpretable deep learning model for classifying adaptor protein complexes from sequence information
AU - Kha, Quang Hien
AU - Tran, Thi Oanh
AU - Nguyen, Trinh Trung Duong
AU - Nguyen, Van Nui
AU - Than, Khoat
AU - Le, Nguyen Quoc Khanh
N1 - Publisher Copyright:
© 2022 Elsevier Inc.
PY - 2022/11
Y1 - 2022/11
N2 - Adaptor proteins (APs) are a family of proteins that aids in intracellular membrane trafficking, and their impairments or defects are closely related to various disorders. Traditional methods to identify and classify APs require time and complex techniques, which were then advanced by machine learning and computational approaches to facilitate the APs recognition task. However, most studies focused on recognizing separate ones in the APs family or the APs in general with non-APs, lacking one comprehensive strategy to distinguish the complexes of AP subtypes. Herein, we proposed a novel method to implement one novel task as discriminating the AP complexes in the APs family, utilizing an interpretable deep neural network architecture on sequence-based encoding features. This work also introduced a benchmark data set of AP complexes originating from the UniProt and GeneOntology databases. To assess the robustness of our proposed method, we compared our performance to various machine learning algorithms and feature extraction strategies. Furthermore, the interpretation of the model's prediction performance was implemented using t-distributed stochastic neighbor embedding (t-SNE), uniform manifold approximation and projection (UMAP), and SHapley Additive exPlanations (SHAP) analysis to show the distribution of AP complexes on optimal features. The promising performance of our architecture can assist scientists not only in AP complexes distinction but also in general protein sequences. Moreover, we have also made our work publicly on GitHub https://github.com/khanhlee/adaptor-dnn.
AB - Adaptor proteins (APs) are a family of proteins that aids in intracellular membrane trafficking, and their impairments or defects are closely related to various disorders. Traditional methods to identify and classify APs require time and complex techniques, which were then advanced by machine learning and computational approaches to facilitate the APs recognition task. However, most studies focused on recognizing separate ones in the APs family or the APs in general with non-APs, lacking one comprehensive strategy to distinguish the complexes of AP subtypes. Herein, we proposed a novel method to implement one novel task as discriminating the AP complexes in the APs family, utilizing an interpretable deep neural network architecture on sequence-based encoding features. This work also introduced a benchmark data set of AP complexes originating from the UniProt and GeneOntology databases. To assess the robustness of our proposed method, we compared our performance to various machine learning algorithms and feature extraction strategies. Furthermore, the interpretation of the model's prediction performance was implemented using t-distributed stochastic neighbor embedding (t-SNE), uniform manifold approximation and projection (UMAP), and SHapley Additive exPlanations (SHAP) analysis to show the distribution of AP complexes on optimal features. The promising performance of our architecture can assist scientists not only in AP complexes distinction but also in general protein sequences. Moreover, we have also made our work publicly on GitHub https://github.com/khanhlee/adaptor-dnn.
KW - Adaptor protein
KW - Computational biology
KW - Deep neural network
KW - Interpretable machine learning
KW - Protein function prediction
KW - Sequence analysis
UR - http://www.scopus.com/inward/record.url?scp=85139011804&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85139011804&partnerID=8YFLogxK
U2 - 10.1016/j.ymeth.2022.09.007
DO - 10.1016/j.ymeth.2022.09.007
M3 - Article
C2 - 36174933
AN - SCOPUS:85139011804
SN - 1046-2023
VL - 207
SP - 90
EP - 96
JO - Methods
JF - Methods
ER -