TY - JOUR
T1 - A computational framework based on ensemble deep neural networks for essential genes identification
AU - Le, Nguyen Quoc Khanh
AU - Do, Duyen Thi
AU - Hung, Truong Nguyen Khanh
AU - Lam, Luu Ho Thanh
AU - Huynh, Tuan Tu
AU - Nguyen, Ngan Thi Kim
N1 - Funding Information:
Funding: This research was funded by the Research Grant for Newly Hired Faculty, Taipei Medical University (TMU), grant number TMU108‐AE1‐B26 and Higher Education Sprout Project, Ministry of Education (MOE), Taiwan, grant number DP2‐109‐21121‐01‐A‐06.
Publisher Copyright:
© 2020 by the authors. Licensee MDPI, Basel, Switzerland.
Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2020/12/1
Y1 - 2020/12/1
N2 - Essential genes contain key information of genomes that could be the key to a comprehensive understanding of life and evolution. Because of their importance, studies of essential genes have been considered a crucial problem in computational biology. Computational methods for identifying essential genes have become increasingly popular to reduce the cost and time-consumption of traditional experiments. A few models have addressed this problem, but performance is still not satisfactory because of high dimensional features and the use of traditional machine learning algorithms. Thus, there is a need to create a novel model to improve the predictive performance of this problem from DNA sequence features. This study took advantage of a natural language processing (NLP) model in learning biological sequences by treating them as natural language words. To learn the NLP features, a supervised learning model was consequentially employed by an ensemble deep neural network. Our proposed method could identify essential genes with sensitivity, specificity, accuracy, Matthews correlation coefficient (MCC), and area under the receiver operating characteristic curve (AUC) values of 60.2%, 84.6%, 76.3%, 0.449, and 0.814, respectively. The overall performance outperformed the single models without ensemble, as well as the state‐-of‐-the‐-art predictors on the same benchmark dataset. This indicated the effectiveness of the proposed method in determining essential genes, in particular, and other sequencing problems, in general.
AB - Essential genes contain key information of genomes that could be the key to a comprehensive understanding of life and evolution. Because of their importance, studies of essential genes have been considered a crucial problem in computational biology. Computational methods for identifying essential genes have become increasingly popular to reduce the cost and time-consumption of traditional experiments. A few models have addressed this problem, but performance is still not satisfactory because of high dimensional features and the use of traditional machine learning algorithms. Thus, there is a need to create a novel model to improve the predictive performance of this problem from DNA sequence features. This study took advantage of a natural language processing (NLP) model in learning biological sequences by treating them as natural language words. To learn the NLP features, a supervised learning model was consequentially employed by an ensemble deep neural network. Our proposed method could identify essential genes with sensitivity, specificity, accuracy, Matthews correlation coefficient (MCC), and area under the receiver operating characteristic curve (AUC) values of 60.2%, 84.6%, 76.3%, 0.449, and 0.814, respectively. The overall performance outperformed the single models without ensemble, as well as the state‐-of‐-the‐-art predictors on the same benchmark dataset. This indicated the effectiveness of the proposed method in determining essential genes, in particular, and other sequencing problems, in general.
KW - Continuous bag of words
KW - Deep learning
KW - DNA sequencing
KW - Ensemble learning
KW - Essential genetics and genomics
KW - FastText
KW - Prediction model
UR - http://www.scopus.com/inward/record.url?scp=85096871834&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85096871834&partnerID=8YFLogxK
U2 - 10.3390/ijms21239070
DO - 10.3390/ijms21239070
M3 - Article
C2 - 33260643
AN - SCOPUS:85096871834
SN - 1661-6596
VL - 21
SP - 1
EP - 16
JO - International journal of molecular sciences
JF - International journal of molecular sciences
IS - 23
M1 - 9070
ER -