TY - GEN
T1 - Utilizing different word representation methods for twitter data in adverse drug reactions extraction
AU - Lin, Wei San
AU - Dai, Hong Jie
AU - Jonnagaddala, Jitendra
AU - Chang, Nai Wun
AU - Jue, Toni Rose
AU - Iqbal, Usman
AU - Shao, Joni Yu Hsuan
AU - Chiang, I. Jen
AU - Li, Yu Chuan
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2016/2/12
Y1 - 2016/2/12
N2 - With the advancement of technology and development of social media, patients discuss medications and other related information including adverse drug reactions (ADRs) with their friends, family or other patients. Although, there are various pros and cons of using social media for automatic ADR monitoring, information on social media provided by patients about drugs are widely considered a valuable resource for post-marketing drug surveillance. In this study, we developed a named entity recognition (NER) system based on conditional random fields to identify ADRs-related information from Twitter data. The representation of words for the input text is one of the crucial steps in supervised learning. Recently, the word vector representation is becoming popular, which uses unlabeled data to provide a generalization for reducing the data sparsity in word representation. This study examines different word representation methods for the ADR recognition task, including token normalization, and two state-of-the-art word embedding methods, namely word2vec and the global vectors (GloVe). The experimental results demonstrate that all of the studied representation scheme can improve the recall rate and overall F-measure with the cost of the reduced precision. The manual analysis of the generated clusters demonstrates that word2vec has stronger cluster trends compared to GloVe.
AB - With the advancement of technology and development of social media, patients discuss medications and other related information including adverse drug reactions (ADRs) with their friends, family or other patients. Although, there are various pros and cons of using social media for automatic ADR monitoring, information on social media provided by patients about drugs are widely considered a valuable resource for post-marketing drug surveillance. In this study, we developed a named entity recognition (NER) system based on conditional random fields to identify ADRs-related information from Twitter data. The representation of words for the input text is one of the crucial steps in supervised learning. Recently, the word vector representation is becoming popular, which uses unlabeled data to provide a generalization for reducing the data sparsity in word representation. This study examines different word representation methods for the ADR recognition task, including token normalization, and two state-of-the-art word embedding methods, namely word2vec and the global vectors (GloVe). The experimental results demonstrate that all of the studied representation scheme can improve the recall rate and overall F-measure with the cost of the reduced precision. The manual analysis of the generated clusters demonstrates that word2vec has stronger cluster trends compared to GloVe.
KW - adverse drug reactions
KW - named entity recognition
KW - natural language processing
KW - social media
KW - word embedding
UR - http://www.scopus.com/inward/record.url?scp=84964284479&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84964284479&partnerID=8YFLogxK
U2 - 10.1109/TAAI.2015.7407070
DO - 10.1109/TAAI.2015.7407070
M3 - Conference contribution
AN - SCOPUS:84964284479
T3 - TAAI 2015 - 2015 Conference on Technologies and Applications of Artificial Intelligence
SP - 260
EP - 265
BT - TAAI 2015 - 2015 Conference on Technologies and Applications of Artificial Intelligence
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - Conference on Technologies and Applications of Artificial Intelligence, TAAI 2015
Y2 - 20 November 2015 through 22 November 2015
ER -