Use of Probabilistic Topic Models for Search

Abstract

This thesis solves a common issue in search applications. Typically, the user does not know exactly which terms are used in a document he is searching for. Several attempts have been made to overcome this issue by augmenting the document model and/or the query. In this thesis, a probabilistic topic model augments the document model. Probabilistic document models are formally introduced and inference methods are derived. It is shown how these models can be used for information retrieval tasks and how a search application can be implemented. A prototype was implemented and the implementation is tested and evaluated based on benchmark corpora. The evaluation provides empirical evidence that probabilistic document models improve the retrieval performance significantly, and shows which preprocessing steps should be made before applying the model.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Sep 01, 2009
Accession Number
ADA509176

Entities

People

  • Marco Draeger

Organizations

  • Naval Postgraduate School

Tags

Communities of Interest

  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Bayesian Networks
  • Computational Science
  • Data Mining
  • Databases
  • Hidden Markov Models
  • Information Processing
  • Information Retrieval
  • Information Science
  • Machine Learning
  • Markov Models
  • Monte Carlo Method
  • Natural Language Processing
  • Operations Research
  • Probability
  • Probability Distributions
  • Random Variables
  • Stochastic Processes

Fields of Study

  • Computer science

Readers

  • Computational Modeling and Simulation
  • Database Systems and Applications
  • Neural Network Machine Learning.

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval
  • AI & ML - Neural Networks