Incorporating Non-Relevance Information in the Estimation of Query Models

Abstract

The authors describe the participation of the University of Amsterdam's Information and Language Processing Systems (ILPS) group in the Relevance Feedback track at TREC 2008. They introduce a new model which incorporates information from relevant and nonrelevant documents to improve the estimation of query models. The study attempts to answer three research questions. First, can nonrelevance information be effectively modeled to improve the estimation of a query model? Second, given our model, what is the effect of the relative size of the set of nonrelevant documents with respect to the relevant documents on retrieval effectiveness? And, third, we ask the question whether and when explicit nonrelevance information helps. In other words, what are the effects when we substitute the estimates on the nonrelevant documents with more general estimates, such as from the collection? The model we propose leverages the distance between each relevant document and the set of nonrelevant documents by penalizing terms that occur frequently in the latter, similar to the intuitions described by Wang et al. (2008). Instead of subtracting probabilities, however, we take a more principled approach based on the Normalized Log Likelihood Ratio (NLLR). Their main findings are twofold: (1) in terms of statMAP, a larger number of (judged to be) nonrelevant documents improves retrieval effectiveness; and (2) on the TREC Terabyte topics, they can effectively replace the estimates on the (judged to be) nonrelevant documents with estimations on the document collection.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Nov 01, 2008
Accession Number
ADA512743

Entities

People

  • Edgar Meij
  • Jiyin He
  • Maarten De Rijke
  • Wouter Weerkamp

Organizations

  • University of Amsterdam

Tags

DTIC Thesaurus Topics

  • Abstracts
  • Base Lines
  • Education
  • Feedback
  • Governments
  • Information Operations
  • Information Retrieval
  • Judgment
  • Language
  • Probability
  • Probability Distributions
  • Scientific Research
  • Standards
  • Terabytes

Fields of Study

  • Computer science

Readers

  • Computational Modeling and Simulation
  • Information Retrieval
  • Systems Analysis and Design