TY - GEN
T1 - Chemical-Induced Disease Detection Using Invariance-based Pattern Learning Model
AU - Warikoo, Neha
AU - Chang, Yung Chun
AU - Hsu, Wen Lian
N1 - Funding Information:
We are grateful for the constructive comments from three anonymous reviewers. This work was supported by grant MOST106-3114-E-001-002 and MOST105-2221-E-001-008-MY3 from the Ministry of Science and Technology, Taiwan.
Publisher Copyright:
© 2017 AFNLP
PY - 2017
Y1 - 2017
N2 - In this work, we introduce a novel feature engineering approach named “algebraic invariance” to identify discriminative patterns for learning relation pair features for the chemical-disease relation (CDR) task of BioCreative V. Our method exploits the existing structural similarity of the key concepts of relation descriptions from the CDR corpus to generate robust linguistic patterns for SVM tree kernel-based learning. Preprocessing of the training data classifies the entity pairs as either related or unrelated to build instance types for both inter-sentential and intra-sentential scenarios. An invariant function is proposed to process and optimally cluster similar patterns for both positive and negative instances. The learning model for CDR pairs is based on the SVM tree kernel approach, which generates feature trees and vectors and is modeled on suitable invariance based patterns, bringing brevity, precision and context to the identifier features. Results demonstrate that our method outperformed compared approaches, achieved a high recall rate of 85.08%, and averaged an F1-score of 54.34% without the use of any additional knowledge bases.
AB - In this work, we introduce a novel feature engineering approach named “algebraic invariance” to identify discriminative patterns for learning relation pair features for the chemical-disease relation (CDR) task of BioCreative V. Our method exploits the existing structural similarity of the key concepts of relation descriptions from the CDR corpus to generate robust linguistic patterns for SVM tree kernel-based learning. Preprocessing of the training data classifies the entity pairs as either related or unrelated to build instance types for both inter-sentential and intra-sentential scenarios. An invariant function is proposed to process and optimally cluster similar patterns for both positive and negative instances. The learning model for CDR pairs is based on the SVM tree kernel approach, which generates feature trees and vectors and is modeled on suitable invariance based patterns, bringing brevity, precision and context to the identifier features. Results demonstrate that our method outperformed compared approaches, achieved a high recall rate of 85.08%, and averaged an F1-score of 54.34% without the use of any additional knowledge bases.
UR - http://www.scopus.com/inward/record.url?scp=85057337249&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85057337249&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85057337249
T3 - DDDSM 2017 - 1st International Workshop on Digital Disease Detection using Social Media, Proceedings of the Workshop
SP - 57
EP - 64
BT - DDDSM 2017 - 1st International Workshop on Digital Disease Detection using Social Media, Proceedings of the Workshop
PB - Association for Computational Linguistics (ACL)
T2 - 1st International Workshop on Digital Disease Detection using Social Media, DDDSM 2017, co-located with the 8th International Joint Conference on Natural Language Processing, IJCNLP 2017
Y2 - 27 November 2017
ER -