TY - JOUR
T1 - LPTK
T2 - A linguistic pattern-aware dependency tree kernel approach for the BioCreative VI CHEMPROT task
AU - Warikoo, Neha
AU - Chang, Yung Chun
AU - Hsu, Wen Lian
N1 - Publisher Copyright:
© The Author(s) 2018. Published by Oxford University Press.
PY - 2018/1/1
Y1 - 2018/1/1
N2 - Identifying the interactions between chemical compounds and genes from biomedical literatures is one of the frequently discussed topics of text mining in the life science field. In this paper, we describe Linguistic Pattern-Aware Dependency Tree Kernel, a linguistic interaction pattern learning method developed for CHEMPROT task-BioCreative VI, to capture chemical-protein interaction (CPI) patterns within biomedical literatures. We also introduce a framework to integrate these linguistic patterns with smooth partial tree kernel to extract the CPIs. This new method of feature representation models aspects of linguistic probability in geometric representation, which not only optimizes the sufficiency of feature dimension for classification, but also defines features as interpretable contexts rather than long vectors of numbers. In order to test the robustness and efficiency of our system in identifying different kinds of biological interactions, we evaluated our framework on three separate data sets, i.e. CHEMPROT corpus, Chemical-Disease Relation corpus and Protein-Protein Interaction corpus. Corresponding experiment results demonstrate that our method is effective and outperforms several compared systems for each data set.
AB - Identifying the interactions between chemical compounds and genes from biomedical literatures is one of the frequently discussed topics of text mining in the life science field. In this paper, we describe Linguistic Pattern-Aware Dependency Tree Kernel, a linguistic interaction pattern learning method developed for CHEMPROT task-BioCreative VI, to capture chemical-protein interaction (CPI) patterns within biomedical literatures. We also introduce a framework to integrate these linguistic patterns with smooth partial tree kernel to extract the CPIs. This new method of feature representation models aspects of linguistic probability in geometric representation, which not only optimizes the sufficiency of feature dimension for classification, but also defines features as interpretable contexts rather than long vectors of numbers. In order to test the robustness and efficiency of our system in identifying different kinds of biological interactions, we evaluated our framework on three separate data sets, i.e. CHEMPROT corpus, Chemical-Disease Relation corpus and Protein-Protein Interaction corpus. Corresponding experiment results demonstrate that our method is effective and outperforms several compared systems for each data set.
UR - http://www.scopus.com/inward/record.url?scp=85055077690&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85055077690&partnerID=8YFLogxK
U2 - 10.1093/database/bay108
DO - 10.1093/database/bay108
M3 - Article
C2 - 30346607
AN - SCOPUS:85055077690
SN - 1758-0463
VL - 2018
JO - Database
JF - Database
IS - 2018
ER -