TY - GEN
T1 - Agglomerative algorithm to discover semantics from unstructured big data
AU - Chiang, I-Jen
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015/12/22
Y1 - 2015/12/22
N2 - The paper presents a graph model and an agglomerative algorithm for text document clustering. Given a set of documents, the associations among frequently co-occurring terms in any of the documents naturally form a graph, which can be decomposed into connected components at various levels. Each connected component represents a concept in the collection. These concepts can categorize documents into different semantic classes. The experiments on three different data sets from news, Web, and medical literatures have shown our algorithm is significantly better than traditional clustering algorithms, such as k-means, principal direction division partitioning, AutoClass and hierarchical clustering.
AB - The paper presents a graph model and an agglomerative algorithm for text document clustering. Given a set of documents, the associations among frequently co-occurring terms in any of the documents naturally form a graph, which can be decomposed into connected components at various levels. Each connected component represents a concept in the collection. These concepts can categorize documents into different semantic classes. The experiments on three different data sets from news, Web, and medical literatures have shown our algorithm is significantly better than traditional clustering algorithms, such as k-means, principal direction division partitioning, AutoClass and hierarchical clustering.
KW - agglomerative document categorization/clustering
KW - association rules
KW - hierarchical clustering
KW - hypergraph
UR - http://www.scopus.com/inward/record.url?scp=84963745319&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84963745319&partnerID=8YFLogxK
U2 - 10.1109/BigData.2015.7363920
DO - 10.1109/BigData.2015.7363920
M3 - Conference contribution
AN - SCOPUS:84963745319
T3 - Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015
SP - 1556
EP - 1563
BT - Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015
A2 - Luo, Feng
A2 - Ogan, Kemafor
A2 - Zaki, Mohammed J.
A2 - Haas, Laura
A2 - Ooi, Beng Chin
A2 - Kumar, Vipin
A2 - Rachuri, Sudarsan
A2 - Pyne, Saumyadipta
A2 - Ho, Howard
A2 - Hu, Xiaohua
A2 - Yu, Shipeng
A2 - Hsiao, Morris Hui-I
A2 - Li, Jian
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 3rd IEEE International Conference on Big Data, IEEE Big Data 2015
Y2 - 29 October 2015 through 1 November 2015
ER -