TY - GEN
T1 - Distributed keyword vector representation for document categorization
AU - Hsieh, Yu Lun
AU - Liu, Shih Hung
AU - Chang, Yung Chun
AU - Hsu, Wen Lian
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2016/2/12
Y1 - 2016/2/12
N2 - In the age of information explosion, efficiently categorizing the topic of a document can assist our organization and comprehension of the vast amount of text. In this paper, we propose a novel approach, named DKV, for document categorization using distributed real-valued vector representation of keywords learned from neural networks. Such a representation can project rich context information (or embedding) into the vector space, and subsequently be used to infer similarity measures among words, sentences, and even documents. Using a Chinese news corpus containing over 100,000 articles and five topics, we provide a comprehensive performance evaluation to demonstrate that by exploiting the keyword embeddings, DKV paired with support vector machines can effectively categorize a document into the predefined topics. Results demonstrate that our method can achieve the best performances compared to several other approaches.
AB - In the age of information explosion, efficiently categorizing the topic of a document can assist our organization and comprehension of the vast amount of text. In this paper, we propose a novel approach, named DKV, for document categorization using distributed real-valued vector representation of keywords learned from neural networks. Such a representation can project rich context information (or embedding) into the vector space, and subsequently be used to infer similarity measures among words, sentences, and even documents. Using a Chinese news corpus containing over 100,000 articles and five topics, we provide a comprehensive performance evaluation to demonstrate that by exploiting the keyword embeddings, DKV paired with support vector machines can effectively categorize a document into the predefined topics. Results demonstrate that our method can achieve the best performances compared to several other approaches.
KW - document representation
KW - neural network
KW - word embedding
UR - http://www.scopus.com/inward/record.url?scp=84964228049&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84964228049&partnerID=8YFLogxK
U2 - 10.1109/TAAI.2015.7407126
DO - 10.1109/TAAI.2015.7407126
M3 - Conference contribution
AN - SCOPUS:84964228049
T3 - TAAI 2015 - 2015 Conference on Technologies and Applications of Artificial Intelligence
SP - 245
EP - 251
BT - TAAI 2015 - 2015 Conference on Technologies and Applications of Artificial Intelligence
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - Conference on Technologies and Applications of Artificial Intelligence, TAAI 2015
Y2 - 20 November 2015 through 22 November 2015
ER -