TY - JOUR
T1 - LBERT
T2 - Lexically aware Transformer-based Bidirectional Encoder Representation model for learning universal bio-entity relations
AU - Warikoo, Neha
AU - Chang, Yung Chun
AU - Hsu, Wen Lian
N1 - Publisher Copyright:
© The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: [email protected].
PY - 2021/4/20
Y1 - 2021/4/20
N2 - MOTIVATION: Natural Language Processing techniques are constantly being advanced to accommodate the influx of data as well as to provide exhaustive and structured knowledge dissemination. Within the biomedical domain, relation detection between bio-entities known as the Bio-Entity Relation Extraction (BRE) task has a critical function in knowledge structuring. Although recent advances in deep learning-based biomedical domain embedding have improved BRE predictive analytics, these works are often task selective or use external knowledge-based pre-/post-processing. In addition, deep learning-based models do not account for local syntactic contexts, which have improved data representation in many kernel classifier-based models. In this study, we propose a universal BRE model, i.e. LBERT, which is a Lexically aware Transformer-based Bidirectional Encoder Representation model, and which explores both local and global contexts representations for sentence-level classification tasks. RESULTS: This article presents one of the most exhaustive BRE studies ever conducted over five different bio-entity relation types. Our model outperforms state-of-the-art deep learning models in protein-protein interaction (PPI), drug-drug interaction and protein-bio-entity relation classification tasks by 0.02%, 11.2% and 41.4%, respectively. LBERT representations show a statistically significant improvement over BioBERT in detecting true bio-entity relation for large corpora like PPI. Our ablation studies clearly indicate the contribution of the lexical features and distance-adjusted attention in improving prediction performance by learning additional local semantic context along with bi-directionally learned global context. AVAILABILITY AND IMPLEMENTATION: Github. https://github.com/warikoone/LBERT. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
AB - MOTIVATION: Natural Language Processing techniques are constantly being advanced to accommodate the influx of data as well as to provide exhaustive and structured knowledge dissemination. Within the biomedical domain, relation detection between bio-entities known as the Bio-Entity Relation Extraction (BRE) task has a critical function in knowledge structuring. Although recent advances in deep learning-based biomedical domain embedding have improved BRE predictive analytics, these works are often task selective or use external knowledge-based pre-/post-processing. In addition, deep learning-based models do not account for local syntactic contexts, which have improved data representation in many kernel classifier-based models. In this study, we propose a universal BRE model, i.e. LBERT, which is a Lexically aware Transformer-based Bidirectional Encoder Representation model, and which explores both local and global contexts representations for sentence-level classification tasks. RESULTS: This article presents one of the most exhaustive BRE studies ever conducted over five different bio-entity relation types. Our model outperforms state-of-the-art deep learning models in protein-protein interaction (PPI), drug-drug interaction and protein-bio-entity relation classification tasks by 0.02%, 11.2% and 41.4%, respectively. LBERT representations show a statistically significant improvement over BioBERT in detecting true bio-entity relation for large corpora like PPI. Our ablation studies clearly indicate the contribution of the lexical features and distance-adjusted attention in improving prediction performance by learning additional local semantic context along with bi-directionally learned global context. AVAILABILITY AND IMPLEMENTATION: Github. https://github.com/warikoone/LBERT. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
KW - Knowledge Bases
KW - Language
KW - Natural Language Processing
KW - Research Design
KW - Semantics
UR - http://www.scopus.com/inward/record.url?scp=85105695026&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85105695026&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/btaa721
DO - 10.1093/bioinformatics/btaa721
M3 - Article
C2 - 32810217
AN - SCOPUS:85105695026
SN - 1367-4803
VL - 37
SP - 404
EP - 412
JO - Bioinformatics (Oxford, England)
JF - Bioinformatics (Oxford, England)
IS - 3
ER -