TY - JOUR
T1 - IEnhancer-ECNN
T2 - Identifying enhancers and their strength using ensembles of convolutional neural networks
AU - Nguyen, Quang H.
AU - Nguyen-Vo, Thanh Hoang
AU - Le, Nguyen Quoc Khanh
AU - Do, Trang T.T.
AU - Rahardja, Susanto
AU - Nguyen, Binh P.
N1 - Publisher Copyright:
© 2019 The Author(s).
PY - 2019/12/24
Y1 - 2019/12/24
N2 - Background: Enhancers are non-coding DNA fragments which are crucial in gene regulation (e.g. transcription and translation). Having high locational variation and free scattering in 98% of non-encoding genomes, enhancer identification is, therefore, more complicated than other genetic factors. To address this biological issue, several in silico studies have been done to identify and classify enhancer sequences among a myriad of DNA sequences using computational advances. Although recent studies have come up with improved performance, shortfalls in these learning models still remain. To overcome limitations of existing learning models, we introduce iEnhancer-ECNN, an efficient prediction framework using one-hot encoding and k-mers for data transformation and ensembles of convolutional neural networks for model construction, to identify enhancers and classify their strength. The benchmark dataset from Liu et al.'s study was used to develop and evaluate the ensemble models. A comparative analysis between iEnhancer-ECNN and existing state-of-the-art methods was done to fairly assess the model performance. Results: Our experimental results demonstrates that iEnhancer-ECNN has better performance compared to other state-of-the-art methods using the same dataset. The accuracy of the ensemble model for enhancer identification (layer 1) and enhancer classification (layer 2) are 0.769 and 0.678, respectively. Compared to other related studies, improvements in the Area Under the Receiver Operating Characteristic Curve (AUC), sensitivity, and Matthews's correlation coefficient (MCC) of our models are remarkable, especially for the model of layer 2 with about 11.0%, 46.5%, and 65.0%, respectively. Conclusions: iEnhancer-ECNN outperforms other previously proposed methods with significant improvement in most of the evaluation metrics. Strong growths in the MCC of both layers are highly meaningful in assuring the stability of our models.
AB - Background: Enhancers are non-coding DNA fragments which are crucial in gene regulation (e.g. transcription and translation). Having high locational variation and free scattering in 98% of non-encoding genomes, enhancer identification is, therefore, more complicated than other genetic factors. To address this biological issue, several in silico studies have been done to identify and classify enhancer sequences among a myriad of DNA sequences using computational advances. Although recent studies have come up with improved performance, shortfalls in these learning models still remain. To overcome limitations of existing learning models, we introduce iEnhancer-ECNN, an efficient prediction framework using one-hot encoding and k-mers for data transformation and ensembles of convolutional neural networks for model construction, to identify enhancers and classify their strength. The benchmark dataset from Liu et al.'s study was used to develop and evaluate the ensemble models. A comparative analysis between iEnhancer-ECNN and existing state-of-the-art methods was done to fairly assess the model performance. Results: Our experimental results demonstrates that iEnhancer-ECNN has better performance compared to other state-of-the-art methods using the same dataset. The accuracy of the ensemble model for enhancer identification (layer 1) and enhancer classification (layer 2) are 0.769 and 0.678, respectively. Compared to other related studies, improvements in the Area Under the Receiver Operating Characteristic Curve (AUC), sensitivity, and Matthews's correlation coefficient (MCC) of our models are remarkable, especially for the model of layer 2 with about 11.0%, 46.5%, and 65.0%, respectively. Conclusions: iEnhancer-ECNN outperforms other previously proposed methods with significant improvement in most of the evaluation metrics. Strong growths in the MCC of both layers are highly meaningful in assuring the stability of our models.
KW - Classification
KW - Convolutional neural network
KW - Deep learning
KW - Enhancer
KW - Ensemble
KW - Identification
KW - One-hot encoding
UR - http://www.scopus.com/inward/record.url?scp=85077119464&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85077119464&partnerID=8YFLogxK
U2 - 10.1186/s12864-019-6336-3
DO - 10.1186/s12864-019-6336-3
M3 - Article
C2 - 31874637
AN - SCOPUS:85077119464
SN - 1471-2164
VL - 20
JO - BMC Genomics
JF - BMC Genomics
M1 - 951
ER -