Lightening the Load of Document Smoothing for Better Language Modeling Retrieval

Abstract

The authors hypothesized that language modeling retrieval would improve if they reduced the need for document smoothing to provide an inverse document frequency-like (IDF) effect. They created inverse collection frequency-weighted (ICF) query models as a tool to partially separate the IDF-like role from document smoothing. Compared to maximum likelihood estimated (MLE) queries, the ICF-weighted queries achieved a 6.4% improvement in mean average precision on description queries. The ICF-weighted queries performed better with less document smoothing than that required by MLE queries. Language modeling retrieval may benefit from a means to separately incorporate an IDF-like behavior outside of document smoothing.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2006
Accession Number
ADA448634

Entities

People

  • James Allan
  • Mark D. Smucker

Organizations

  • University of Massachusetts Amherst

Tags

DTIC Thesaurus Topics

  • Abstracts
  • Computer Science
  • Frequency
  • Information Operations
  • Information Retrieval
  • Language
  • Law
  • Mathematics
  • Maximum Likelihood Estimation
  • Models
  • Precision
  • Probabilistic Models
  • Probability
  • Test Sets
  • Training

Readers

  • Approximation Theory.
  • Computational Modeling and Simulation
  • Database Systems and Applications