Semantic frame-based statistical approach for topic detection

Yung Chun Chang, Yu Lun Hsieh, Cen Chieh Chen, Chad Liu, Chun Hung Lu, Wen Lian Hsu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

We propose a statistical frame-based approach (FBA) for natural language processing, and demonstrate its advantage over traditional machine learning methods by using topic detection as a case study. FBA perceives and identifies semantic knowledge in a more general manner by collecting important linguistic patterns within documents through a unique flexible matching scheme that allows word insertion, deletion and substitution (IDS) to capture linguistic structures within the text. In addition, FBA can also overcome major issues of the rule-based approach by reducing human effort through its highly automated pattern generation and summarization. Using Yahoo! Chinese news corpus containing about 140,000 news articles, we provide a comprehensive performance evaluation that demonstrates the effectiveness of FBA in detecting the topic of a document by exploiting the semantic association and the context within the text. Moreover, it outperforms common topic models like Näive Bayes, Vector Space Model, and LDA-SVM.

Original languageEnglish
Title of host publicationProceedings of the 28th Pacific Asia Conference on Language, Information and Computation, PACLIC 2014
EditorsPrachya Boonkwan, Wirote Aroonmanakun, Thepchai Supnithi
PublisherFaculty of Pharmaceutical Sciences, Chulalongkorn University
Pages75-84
Number of pages10
ISBN (Electronic)9786165518871
Publication statusPublished - 2014
Externally publishedYes
Event28th Pacific Asia Conference on Language, Information and Computation, PACLIC 2014 - Phuket, Thailand
Duration: Dec 12 2014Dec 14 2014

Publication series

NameProceedings of the 28th Pacific Asia Conference on Language, Information and Computation, PACLIC 2014

Conference

Conference28th Pacific Asia Conference on Language, Information and Computation, PACLIC 2014
Country/TerritoryThailand
CityPhuket
Period12/12/1412/14/14

ASJC Scopus subject areas

  • Language and Linguistics
  • Computer Science (miscellaneous)

Fingerprint

Dive into the research topics of 'Semantic frame-based statistical approach for topic detection'. Together they form a unique fingerprint.

Cite this