Identifying latent semantics in high-dimensional web data

Ajit Kumar, Sanjeev Maskara, Jau Min Wong, I-Jen Chiang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Search engines have become an indispensable tool for obtaining rele-vant information on the Web. The search engine often generates a large number of results, including several irrelevant items that obscure the comprehension of the generated results. Therefore, the search engines need to be enhanced to dis-cover the latent semantics in high-dimensional web data. This paper purports to explain a novel framework, including its implementation and evaluation. To discover the latent semantics in high-dimensional web data, we proposed a framework named Latent Semantic Manifold (LSM). LSM is a mixture model based on the concepts of topology and probability. The framework can find the latent semantics in web data and represent them in homogeneous groups. The framework will be evaluated by experiments. The LSM framework outper-formed compared to other frameworks. In addition, we deployed the framework to develop a tool. The tool was deployed for two years at two places - library and one biomedical engineering laboratory of Taiwan. The tool assisted the re-searchers to do semantic searches of the PubMed database. LSM framework evaluation and deployment suggest that the framework could be used to en-hance the functionalities of currently available search engines by discovering latent semantics in high-dimensional web data.

Original languageEnglish
Title of host publicationCEUR Workshop Proceedings
PublisherCEUR-WS
Volume1114
Publication statusPublished - 2013
Event6th International Workshop on Semantic Web Applications and Tools for Life Sciences, SWAT4LS 2013 - Edinburgh, United Kingdom
Duration: Dec 10 2013Dec 10 2013

Other

Other6th International Workshop on Semantic Web Applications and Tools for Life Sciences, SWAT4LS 2013
Country/TerritoryUnited Kingdom
CityEdinburgh
Period12/10/1312/10/13

Keywords

  • Conditional random field
  • Graph-based tree-width decomposition
  • Hidden markov models
  • Latent semantic manifold
  • Semantic cluster

ASJC Scopus subject areas

  • General Computer Science

Fingerprint

Dive into the research topics of 'Identifying latent semantics in high-dimensional web data'. Together they form a unique fingerprint.

Cite this