Abstract
A text/web document is a knowledge representation of a human idea (a structured set of thoughts). This paper refines TFIDF and Extended TFIDF(ETFIDF)[16]; These values really measures the co-occurrences of tokens. The ETFID captures the semantic more accurately. Tokens with high TFIDF values are called Keywords. The sets of (n+1) Co-occurring keywords with High ETFIDF are called n-granules. The collection of keywords and n-granules can be interpreted geometrically; they form a non-closed simplicial complex. The corresponding non-closed polyhedron is called Latent Semantic Space(LSS). LSS is a geometric knowledge base that provides the semantic to search engine:
Original language | English |
---|---|
Title of host publication | Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics |
Pages | 4763-4767 |
Number of pages | 5 |
Volume | 6 |
DOIs | |
Publication status | Published - 2007 |
Event | 2006 IEEE International Conference on Systems, Man and Cybernetics - Taipei, Taiwan Duration: Oct 8 2006 → Oct 11 2006 |
Other
Other | 2006 IEEE International Conference on Systems, Man and Cybernetics |
---|---|
Country/Territory | Taiwan |
City | Taipei |
Period | 10/8/06 → 10/11/06 |
Keywords
- Granules
- Latent semantic space
- Simplex
ASJC Scopus subject areas
- General Engineering