TY - GEN
T1 - Semantic frame-based statistical approach for topic detection
AU - Chang, Yung Chun
AU - Hsieh, Yu Lun
AU - Chen, Cen Chieh
AU - Liu, Chad
AU - Lu, Chun Hung
AU - Hsu, Wen Lian
N1 - Publisher Copyright:
Copyright 2014 by Yung-Chun Chang, Yu-Lun Hsieh, Cen-Chieh Chen.
PY - 2014
Y1 - 2014
N2 - We propose a statistical frame-based approach (FBA) for natural language processing, and demonstrate its advantage over traditional machine learning methods by using topic detection as a case study. FBA perceives and identifies semantic knowledge in a more general manner by collecting important linguistic patterns within documents through a unique flexible matching scheme that allows word insertion, deletion and substitution (IDS) to capture linguistic structures within the text. In addition, FBA can also overcome major issues of the rule-based approach by reducing human effort through its highly automated pattern generation and summarization. Using Yahoo! Chinese news corpus containing about 140,000 news articles, we provide a comprehensive performance evaluation that demonstrates the effectiveness of FBA in detecting the topic of a document by exploiting the semantic association and the context within the text. Moreover, it outperforms common topic models like Näive Bayes, Vector Space Model, and LDA-SVM.
AB - We propose a statistical frame-based approach (FBA) for natural language processing, and demonstrate its advantage over traditional machine learning methods by using topic detection as a case study. FBA perceives and identifies semantic knowledge in a more general manner by collecting important linguistic patterns within documents through a unique flexible matching scheme that allows word insertion, deletion and substitution (IDS) to capture linguistic structures within the text. In addition, FBA can also overcome major issues of the rule-based approach by reducing human effort through its highly automated pattern generation and summarization. Using Yahoo! Chinese news corpus containing about 140,000 news articles, we provide a comprehensive performance evaluation that demonstrates the effectiveness of FBA in detecting the topic of a document by exploiting the semantic association and the context within the text. Moreover, it outperforms common topic models like Näive Bayes, Vector Space Model, and LDA-SVM.
UR - http://www.scopus.com/inward/record.url?scp=84994070599&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84994070599&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84994070599
T3 - Proceedings of the 28th Pacific Asia Conference on Language, Information and Computation, PACLIC 2014
SP - 75
EP - 84
BT - Proceedings of the 28th Pacific Asia Conference on Language, Information and Computation, PACLIC 2014
A2 - Boonkwan, Prachya
A2 - Aroonmanakun, Wirote
A2 - Supnithi, Thepchai
PB - Faculty of Pharmaceutical Sciences, Chulalongkorn University
T2 - 28th Pacific Asia Conference on Language, Information and Computation, PACLIC 2014
Y2 - 12 December 2014 through 14 December 2014
ER -