Optimal Mixture Models in IR

Abstract

We explore the use of Optimal Mixture Models to represent topics. We analyze two broad classes of mixture models: set-based and weighted. We provide an original proof that estimation of set-based models is NP-hard, and therefore not feasible. We argue that weighted models are superior to set-based models, and the solution can be estimated by a simple gradient descent technique. We demonstrate that Optimal Mixture Models can be successfully applied to the task of document retrieval. Our experiments show that weighted mixtures outperform a simple language modeling baseline. We also observe that weighted mixtures are more robust than other approaches of estimating topical models.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2005
Accession Number
ADA440363

Entities

People

  • Victor Lavrenko

Organizations

  • University of Massachusetts Amherst

Tags

Communities of Interest

  • Human Systems

DTIC Thesaurus Topics

  • Algorithms
  • Computational Complexity
  • Computational Science
  • Equations
  • Information Retrieval
  • Iterations
  • Language
  • Markov Models
  • Models
  • Natural Languages
  • Observation
  • Optimization
  • Polynomials
  • Probability
  • Probability Distributions
  • Semantic Models
  • Vocabulary

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Computational Modeling and Simulation
  • Operations Research