Semantic based clustering of web documents

Tsau Young Lin, I. Jen Chiang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

A new methodology that structures the semantics of a collection of documents into the geometry of a simplicial complex is developed: A primitive concept is represented by a top dimension simplex. and a connected component represents a concept. Based on these structures, documents can be clustered into some meaningful classes. Experiments with three different data sets from web pages and medical literature have shown that the proposed unsupervised clustering approach performs significantly better than traditional clustering algorithms, such as k-means, AutoClass and Hierarchical Clustering (HAC). This abstract geometric model seems have captured the intrinsic semantics of the documents.

Original languageEnglish
Title of host publication2005 IEEE International Conference on Granular Computing
Pages189-192
Number of pages4
DOIs
Publication statusPublished - 2005
Event2005 IEEE International Conference on Granular Computing - Beijing, China
Duration: Jul 25 2005Jul 27 2005

Publication series

Name2005 IEEE International Conference on Granular Computing
Volume2005

Other

Other2005 IEEE International Conference on Granular Computing
Country/TerritoryChina
CityBeijing
Period7/25/057/27/05

Keywords

  • Clustering
  • Document
  • Polyhedron
  • Semantics
  • Web

ASJC Scopus subject areas

  • General Engineering

Fingerprint

Dive into the research topics of 'Semantic based clustering of web documents'. Together they form a unique fingerprint.

Cite this