Distributed keyword vector representation for document categorization

Yu Lun Hsieh, Shih Hung Liu, Yung Chun Chang, Wen Lian Hsu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

In the age of information explosion, efficiently categorizing the topic of a document can assist our organization and comprehension of the vast amount of text. In this paper, we propose a novel approach, named DKV, for document categorization using distributed real-valued vector representation of keywords learned from neural networks. Such a representation can project rich context information (or embedding) into the vector space, and subsequently be used to infer similarity measures among words, sentences, and even documents. Using a Chinese news corpus containing over 100,000 articles and five topics, we provide a comprehensive performance evaluation to demonstrate that by exploiting the keyword embeddings, DKV paired with support vector machines can effectively categorize a document into the predefined topics. Results demonstrate that our method can achieve the best performances compared to several other approaches.

Original languageEnglish
Title of host publicationTAAI 2015 - 2015 Conference on Technologies and Applications of Artificial Intelligence
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages245-251
Number of pages7
ISBN (Electronic)9781467396066
DOIs
Publication statusPublished - Feb 12 2016
Externally publishedYes
EventConference on Technologies and Applications of Artificial Intelligence, TAAI 2015 - Tainan, Taiwan
Duration: Nov 20 2015Nov 22 2015

Publication series

NameTAAI 2015 - 2015 Conference on Technologies and Applications of Artificial Intelligence

Conference

ConferenceConference on Technologies and Applications of Artificial Intelligence, TAAI 2015
Country/TerritoryTaiwan
CityTainan
Period11/20/1511/22/15

Keywords

  • document representation
  • neural network
  • word embedding

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Distributed keyword vector representation for document categorization'. Together they form a unique fingerprint.

Cite this