TY - JOUR
T1 - MaskDNA-PGD
T2 - An innovative deep learning model for detecting DNA methylation by integrating mask sequences and adversarial PGD training as a data augmentation method
AU - Zheng, Zhiwei
AU - Le, Nguyen Quoc Khanh
AU - Chua, Matthew Chin Heng
N1 - Funding Information:
This work was supported by the National Science and Technology Council , Taiwan [grant numbers MOST110-2221-E-038-001-MY2 and MOST111-2628-E-038-002-MY3 ].
Publisher Copyright:
© 2022 Elsevier B.V.
PY - 2023/1/15
Y1 - 2023/1/15
N2 - DNA methylation occurs in mammals’ various diseases, such as cancer and myocardial pain. For a long time, scholars have tried to use machine learning and deep learning to learn the characteristics of DNA sequences with high precision for methylation classifications. However, these studies primarily innovated in encoding and seldom employed deep neural networks for predictions. Hence, this research proposes a framework with random masking and adversarial sample generation in the previous process. Our proposed novel classification model approach composes of convolutional neural network (CNN), bidirectional long short term memory (Bi-LSTM) and attention mechanism as predictors. The benchmark illustrates the automation and advancement of the proposed framework, which can accurately binarily classify diverse DNA methylation. Random masking and adversarial sample generation are proven effective by conducting ablation experiments. In detail, our model achieved the best accuracy of 85.07%, 94.97%, and 92.17% in predicting multi-species N4-methylcytosine, 5-methylcytosine, and N6-methyladenine sites, respectively. Moreover, by comparing performance with two other methods using the same datasets and indexes, the proposed model (namely MaskDNA-PGD) successfully surpasses it. Finally, our MaskDNA-PGD can be freely accessed via https://github.com/willyzzz/MaskDNA-PGD.
AB - DNA methylation occurs in mammals’ various diseases, such as cancer and myocardial pain. For a long time, scholars have tried to use machine learning and deep learning to learn the characteristics of DNA sequences with high precision for methylation classifications. However, these studies primarily innovated in encoding and seldom employed deep neural networks for predictions. Hence, this research proposes a framework with random masking and adversarial sample generation in the previous process. Our proposed novel classification model approach composes of convolutional neural network (CNN), bidirectional long short term memory (Bi-LSTM) and attention mechanism as predictors. The benchmark illustrates the automation and advancement of the proposed framework, which can accurately binarily classify diverse DNA methylation. Random masking and adversarial sample generation are proven effective by conducting ablation experiments. In detail, our model achieved the best accuracy of 85.07%, 94.97%, and 92.17% in predicting multi-species N4-methylcytosine, 5-methylcytosine, and N6-methyladenine sites, respectively. Moreover, by comparing performance with two other methods using the same datasets and indexes, the proposed model (namely MaskDNA-PGD) successfully surpasses it. Finally, our MaskDNA-PGD can be freely accessed via https://github.com/willyzzz/MaskDNA-PGD.
KW - Adversarial network
KW - Bidirectional long short term memory
KW - Convolutional neural network
KW - Data augmentation
KW - DNA methylation
KW - Sequence encoding
UR - http://www.scopus.com/inward/record.url?scp=85142721127&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85142721127&partnerID=8YFLogxK
U2 - 10.1016/j.chemolab.2022.104715
DO - 10.1016/j.chemolab.2022.104715
M3 - Article
AN - SCOPUS:85142721127
SN - 0169-7439
VL - 232
JO - Chemometrics and Intelligent Laboratory Systems
JF - Chemometrics and Intelligent Laboratory Systems
M1 - 104715
ER -