Topic Models in Information Retrieval

Abstract

Topic modeling demonstrates the semantic relations among words, which should be helpful for information retrieval tasks. We present probability mixture modeling and term modeling methods to integrate topic models into language modeling framework for information retrieval. A variety of topic modeling techniques, including manually-built query models, term similarity measures and latent mixture models, especially Latent Dirichlet Allocation (LDA), a formal generative latent mixture model of documents, have been proposed or introduced into IR tasks. We investigated and evaluated them on several TREC collections within presented frameworks, and show that significant improvements over previous work can be obtained. Practical problems such as efficiency and scaling considerations are discussed and compared for different topic models. Other recent topic modeling techniques are also discussed.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Aug 01, 2007
Accession Number
ADA477505

Entities

People

  • Wei Xing

Organizations

  • University of Massachusetts Amherst

Tags

Communities of Interest

  • Autonomy

DTIC Thesaurus Topics

  • Artificial Intelligence
  • Computational Science
  • Computer Science
  • Data Mining
  • Hidden Markov Models
  • Information Processing
  • Information Retrieval
  • Information Science
  • Information Systems
  • Knowledge Management
  • Machine Learning
  • Markov Models
  • Natural Language Processing
  • Ontologies
  • Probabilistic Models
  • Probability
  • Probability Distributions

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Information Retrieval
  • Team-Based Human-Centered Cognitive Task Decision Making and Information Performance.

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval