Agglomerative algorithm to discover semantics from unstructured big data

I-Jen Chiang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

The paper presents a graph model and an agglomerative algorithm for text document clustering. Given a set of documents, the associations among frequently co-occurring terms in any of the documents naturally form a graph, which can be decomposed into connected components at various levels. Each connected component represents a concept in the collection. These concepts can categorize documents into different semantic classes. The experiments on three different data sets from news, Web, and medical literatures have shown our algorithm is significantly better than traditional clustering algorithms, such as k-means, principal direction division partitioning, AutoClass and hierarchical clustering.

Original languageEnglish
Title of host publicationProceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015
EditorsFeng Luo, Kemafor Ogan, Mohammed J. Zaki, Laura Haas, Beng Chin Ooi, Vipin Kumar, Sudarsan Rachuri, Saumyadipta Pyne, Howard Ho, Xiaohua Hu, Shipeng Yu, Morris Hui-I Hsiao, Jian Li
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1556-1563
Number of pages8
ISBN (Electronic)9781479999255
DOIs
Publication statusPublished - Dec 22 2015
Event3rd IEEE International Conference on Big Data, IEEE Big Data 2015 - Santa Clara, United States
Duration: Oct 29 2015Nov 1 2015

Publication series

NameProceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015

Other

Other3rd IEEE International Conference on Big Data, IEEE Big Data 2015
Country/TerritoryUnited States
CitySanta Clara
Period10/29/1511/1/15

Keywords

  • agglomerative document categorization/clustering
  • association rules
  • hierarchical clustering
  • hypergraph

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications
  • Information Systems
  • Software

Fingerprint

Dive into the research topics of 'Agglomerative algorithm to discover semantics from unstructured big data'. Together they form a unique fingerprint.

Cite this