Deepnlp-Dna: a Pre-Trained Nlp Model for Dna Sequence Representation Using Deep Transformers and 2d Convolutional Neural Networks

Project: A - Government Institutionb - National Science and Technology Council

Project Details


This study establishes a novel framework that can improve the interpretation of DNA sequences and the prediction of their molecular functions using transformer NLP + deep learning. The developed framework will be applied in identifying DNA enhancers, promoters, as well as N6-methylation sites that have been proven to be the key mechanisms in molecular biology. This is the first study that observes this combination to improve the predictive performance of DNA bioinformatics research, as well as provides a pre-trained NLP model for general DNA sequences. Through this study, we expect to help transform the world of medicine by training models that predict how genotype defined as a stretch of DNA sequence influences “cell variables”, and enables the exploration of therapies that target these crucial variables.
Effective start/end date8/1/217/31/22


  • natural language processing
  • deep learning
  • contextualized word embedding
  • BERT
  • RoBERTa
  • XLNet
  • convolutional neural network
  • DNA sequencing
  • N6-methylation site
  • DNA enhancers
  • DNA promoters


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.