Semantic based clustering of web documents

Tsau Young Lin, I. Jen Chiang

研究成果: 書貢獻/報告類型會議貢獻

1 引文 斯高帕斯(Scopus)

摘要

A new methodology that structures the semantics of a collection of documents into the geometry of a simplicial complex is developed: A primitive concept is represented by a top dimension simplex. and a connected component represents a concept. Based on these structures, documents can be clustered into some meaningful classes. Experiments with three different data sets from web pages and medical literature have shown that the proposed unsupervised clustering approach performs significantly better than traditional clustering algorithms, such as k-means, AutoClass and Hierarchical Clustering (HAC). This abstract geometric model seems have captured the intrinsic semantics of the documents.

原文英語
主出版物標題2005 IEEE International Conference on Granular Computing
頁面189-192
頁數4
2005
DOIs
出版狀態已發佈 - 2005
事件2005 IEEE International Conference on Granular Computing - Beijing, 中国
持續時間: 7月 25 20057月 27 2005

其他

其他2005 IEEE International Conference on Granular Computing
國家/地區中国
城市Beijing
期間7/25/057/27/05

ASJC Scopus subject areas

  • 工程 (全部)

指紋

深入研究「Semantic based clustering of web documents」主題。共同形成了獨特的指紋。

引用此