TY - JOUR
T1 - DeepEfflux
T2 - A 2D convolutional neural network model for identifying families of efflux proteins in transporters
AU - Wellem Taju, Semmy
AU - Nguyen, Trinh Trung Duong
AU - Le, Nguyen Quoc Khanh
AU - Irawan Kusuma, Rosdyana Mangir
AU - Ou, Yu Yen
N1 - Publisher Copyright:
© The Author(s) 2018. Published by Oxford University Press. All rights reserved.
PY - 2018/1/1
Y1 - 2018/1/1
N2 - Motivation: Efflux protein plays a key role in pumping xenobiotics out of the cells. The prediction of efflux family proteins involved in transport process of compounds is crucial for understanding family structures, functions and energy dependencies. Many methods have been proposed to classify efflux pump transporters without considerations of any pump specific of efflux protein families. In other words, efflux proteins protect cells from extrusion of foreign chemicals. Moreover, almost all efflux protein families have the same structure based on the analysis of significant motifs. The motif sequences consisting of the same amount of residues will have high degrees of residue similarity and thus will affect the classification process. Consequently, it is challenging but vital to recognize the structures and determine energy dependencies of efflux protein families. In order to efficiently identify efflux protein families with considering about pump specific, we developed a 2 D convolutional neural network (2 D CNN) model called DeepEfflux. DeepEfflux tried to capture the motifs of sequences around hidden target residues to use as hidden features of families. In addition, the 2 D CNN model uses a position-specific scoring matrix (PSSM) as an input. Three different datasets, each for one family of efflux protein, was fed into DeepEfflux, and then a 5-fold cross validation approach was used to evaluate the training performance. Results: The model evaluation results show that DeepEfflux outperforms traditional machine learning algorithms. Furthermore, the accuracy of 96.02%, 94.89% and 90.34% for classes A, B and C, respectively, in the independent test results show that our model can perform well and can be used as a reliable tool for identifying families of efflux proteins in transporters.
AB - Motivation: Efflux protein plays a key role in pumping xenobiotics out of the cells. The prediction of efflux family proteins involved in transport process of compounds is crucial for understanding family structures, functions and energy dependencies. Many methods have been proposed to classify efflux pump transporters without considerations of any pump specific of efflux protein families. In other words, efflux proteins protect cells from extrusion of foreign chemicals. Moreover, almost all efflux protein families have the same structure based on the analysis of significant motifs. The motif sequences consisting of the same amount of residues will have high degrees of residue similarity and thus will affect the classification process. Consequently, it is challenging but vital to recognize the structures and determine energy dependencies of efflux protein families. In order to efficiently identify efflux protein families with considering about pump specific, we developed a 2 D convolutional neural network (2 D CNN) model called DeepEfflux. DeepEfflux tried to capture the motifs of sequences around hidden target residues to use as hidden features of families. In addition, the 2 D CNN model uses a position-specific scoring matrix (PSSM) as an input. Three different datasets, each for one family of efflux protein, was fed into DeepEfflux, and then a 5-fold cross validation approach was used to evaluate the training performance. Results: The model evaluation results show that DeepEfflux outperforms traditional machine learning algorithms. Furthermore, the accuracy of 96.02%, 94.89% and 90.34% for classes A, B and C, respectively, in the independent test results show that our model can perform well and can be used as a reliable tool for identifying families of efflux proteins in transporters.
UR - http://www.scopus.com/inward/record.url?scp=85061652064&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85061652064&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/bty302
DO - 10.1093/bioinformatics/bty302
M3 - Article
C2 - 29668844
AN - SCOPUS:85061652064
SN - 1367-4803
VL - 34
SP - 3111
EP - 3117
JO - Bioinformatics
JF - Bioinformatics
IS - 18
ER -