TY - JOUR
T1 - A simplicial complex, a hypergraph, structure in the latent semantic space of document clustering
AU - Lin, Tsau Young
AU - Chiang, I. Jen
PY - 2005/7
Y1 - 2005/7
N2 - This paper presents a novel approach to document clustering based on some geometric structure in Combinatorial Topology. Given a set of documents, the set of associations among frequently co-occurring terms in documents forms naturally a simplicial complex. Our general thesis is each connected component of this simplicial complex represents a concept in the collection. Based on these concepts, documents can be clustered into meaningful classes. However, in this paper, we attack a softer notion, instead of connected components, we use maximal simplexes of highest dimension as representative of connected components, the concept so defined is called maximal primitive concepts. Experiments with three different data sets from Web pages and medical literature have shown that the proposed unsupervised clustering approach performs significantly better than traditional clustering algorithms, such as k-means, AutoClass and Hierarchical Clustering (HAG). This abstract geometric model seems have captured the latent semantic structure of documents.
AB - This paper presents a novel approach to document clustering based on some geometric structure in Combinatorial Topology. Given a set of documents, the set of associations among frequently co-occurring terms in documents forms naturally a simplicial complex. Our general thesis is each connected component of this simplicial complex represents a concept in the collection. Based on these concepts, documents can be clustered into meaningful classes. However, in this paper, we attack a softer notion, instead of connected components, we use maximal simplexes of highest dimension as representative of connected components, the concept so defined is called maximal primitive concepts. Experiments with three different data sets from Web pages and medical literature have shown that the proposed unsupervised clustering approach performs significantly better than traditional clustering algorithms, such as k-means, AutoClass and Hierarchical Clustering (HAG). This abstract geometric model seems have captured the latent semantic structure of documents.
KW - Association rules
KW - Document clustering
KW - Hierarchical clustering
KW - Simplicial complex
KW - Topology
UR - https://www.scopus.com/pages/publications/19044395829
UR - https://www.scopus.com/inward/citedby.url?scp=19044395829&partnerID=8YFLogxK
U2 - 10.1016/j.ijar.2004.11.005
DO - 10.1016/j.ijar.2004.11.005
M3 - Article
AN - SCOPUS:19044395829
SN - 0888-613X
VL - 40
SP - 55
EP - 80
JO - International Journal of Approximate Reasoning
JF - International Journal of Approximate Reasoning
IS - 1-2
ER -