TY - JOUR
T1 - A BERT-based ensemble learning approach for the BioCreative VII challenges
T2 - full-text chemical identification and multi-label classification in PubMed articles
AU - Lin, Sheng Jie
AU - Yeh, Wen Chao
AU - Chiu, Yu Wen
AU - Chang, Yung Chun
AU - Hsu, Min Huei
AU - Chen, Yi Shin
AU - Hsu, Wen Lian
N1 - Publisher Copyright:
© The Author(s) 2022.
PY - 2022
Y1 - 2022
N2 - In this research, we explored various state-of-the-art biomedical-specific pre-trained Bidirectional Encoder Representations from Transformers (BERT) models for the National Library of Medicine - Chemistry (NLM CHEM) and LitCovid tracks in the BioCreative VII Challenge, and propose a BERT-based ensemble learning approach to integrate the advantages of various models to improve the system's performance. The experimental results of the NLM-CHEM track demonstrate that our method can achieve remarkable performance, with F1-scores of 85% and 91.8% in strict and approximate evaluations, respectively. Moreover, the proposed Medical Subject Headings identifier (MeSH ID) normalization algorithm is effective in entity normalization, which achieved a F1-score of about 80% in both strict and approximate evaluations. For the LitCovid track, the proposed method is also effective in detecting topics in the Coronavirus disease 2019 (COVID-19) literature, which outperformed the compared methods and achieve state-of-the-art performance in the LitCovid corpus.
AB - In this research, we explored various state-of-the-art biomedical-specific pre-trained Bidirectional Encoder Representations from Transformers (BERT) models for the National Library of Medicine - Chemistry (NLM CHEM) and LitCovid tracks in the BioCreative VII Challenge, and propose a BERT-based ensemble learning approach to integrate the advantages of various models to improve the system's performance. The experimental results of the NLM-CHEM track demonstrate that our method can achieve remarkable performance, with F1-scores of 85% and 91.8% in strict and approximate evaluations, respectively. Moreover, the proposed Medical Subject Headings identifier (MeSH ID) normalization algorithm is effective in entity normalization, which achieved a F1-score of about 80% in both strict and approximate evaluations. For the LitCovid track, the proposed method is also effective in detecting topics in the Coronavirus disease 2019 (COVID-19) literature, which outperformed the compared methods and achieve state-of-the-art performance in the LitCovid corpus.
UR - http://www.scopus.com/inward/record.url?scp=85134556431&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85134556431&partnerID=8YFLogxK
U2 - 10.1093/database/baac056
DO - 10.1093/database/baac056
M3 - Article
C2 - 35849027
AN - SCOPUS:85134556431
SN - 1758-0463
VL - 2022
JO - Database
JF - Database
M1 - 056
ER -