Optimal Mixture Models in IR
Abstract
We explore the use of Optimal Mixture Models to represent topics. We analyze two broad classes of mixture models: set-based and weighted. We provide an original proof that estimation of set-based models is NP-hard, and therefore not feasible. We argue that weighted models are superior to set-based models, and the solution can be estimated by a simple gradient descent technique. We demonstrate that Optimal Mixture Models can be successfully applied to the task of document retrieval. Our experiments show that weighted mixtures outperform a simple language modeling baseline. We also observe that weighted mixtures are more robust than other approaches of estimating topical models.
Document Details
- Document Type
- Technical Report
- Publication Date
- Jan 01, 2005
- Accession Number
- ADA440363
Entities
People
- Victor Lavrenko
Organizations
- University of Massachusetts Amherst