TY - JOUR
T1 - FAD-BERT
T2 - Improved prediction of FAD binding sites using pre-training of deep bidirectional transformers
AU - Ho, Quang Thai
AU - Nguyen, Trinh Trung Duong
AU - Khanh Le, Nguyen Quoc
AU - Ou, Yu Yen
N1 - Funding Information:
This work was partially supported by the Ministry of Science and Technology, Taiwan, R.O.C. under Grant no. MOST 109-2221-E?155-045 and Grant no. MOST 109-2811-E?155-505.
Publisher Copyright:
© 2021 Elsevier Ltd
PY - 2021/4
Y1 - 2021/4
N2 - The electron transport chain is a series of protein complexes embedded in the process of cellular respiration, which is an important process to transfer electrons and other macromolecules throughout the cell. Identifying Flavin Adenine Dinucleotide (FAD) binding sites in the electron transport chain is vital since it helps biological researchers precisely understand how electrons are produced and are transported in cells. This study distills and analyzes the contextualized word embedding from pre-trained BERT models to explore similarities in natural language and protein sequences. Thereby, we propose a new approach based on Pre-training of Bidirectional Encoder Representations from Transformers (BERT), Position-specific Scoring Matrix profiles (PSSM), Amino Acid Index database (AAIndex) to predict FAD-binding sites from the transport proteins which are found in nature recently. Our proposed approach archives 85.14% accuracy and improves accuracy by 11%, with Matthew's correlation coefficient of 0.39 compared to the previous method on the same independent set. We also deploy a web server that identifies FAD-binding sites in electron transporters available for academics at http://140.138.155.216/fadbert/.
AB - The electron transport chain is a series of protein complexes embedded in the process of cellular respiration, which is an important process to transfer electrons and other macromolecules throughout the cell. Identifying Flavin Adenine Dinucleotide (FAD) binding sites in the electron transport chain is vital since it helps biological researchers precisely understand how electrons are produced and are transported in cells. This study distills and analyzes the contextualized word embedding from pre-trained BERT models to explore similarities in natural language and protein sequences. Thereby, we propose a new approach based on Pre-training of Bidirectional Encoder Representations from Transformers (BERT), Position-specific Scoring Matrix profiles (PSSM), Amino Acid Index database (AAIndex) to predict FAD-binding sites from the transport proteins which are found in nature recently. Our proposed approach archives 85.14% accuracy and improves accuracy by 11%, with Matthew's correlation coefficient of 0.39 compared to the previous method on the same independent set. We also deploy a web server that identifies FAD-binding sites in electron transporters available for academics at http://140.138.155.216/fadbert/.
KW - BERT
KW - Deep learning
KW - Electron transport chain
KW - FAD binding Site
KW - Natural language processing
KW - Position specific scoring matrix
UR - http://www.scopus.com/inward/record.url?scp=85100802301&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85100802301&partnerID=8YFLogxK
U2 - 10.1016/j.compbiomed.2021.104258
DO - 10.1016/j.compbiomed.2021.104258
M3 - Article
C2 - 33601085
AN - SCOPUS:85100802301
SN - 0010-4825
VL - 131
JO - Computers in Biology and Medicine
JF - Computers in Biology and Medicine
M1 - 104258
ER -