Incorporating Non-Relevance Information in the Estimation of Query Models

Abstract

The authors describe the participation of the University of Amsterdam's Information and Language Processing Systems (ILPS) group in the Relevance Feedback track at TREC 2008. They introduce a new model which incorporates information from relevant and nonrelevant documents to improve the estimation of query models. The study attempts to answer three research questions. First, can nonrelevance information be effectively modeled to improve the estimation of a query model? Second, given our model, what is the effect of the relative size of the set of nonrelevant documents with respect to the relevant documents on retrieval effectiveness? And, third, we ask the question whether and when explicit nonrelevance information helps. In other words, what are the effects when we substitute the estimates on the nonrelevant documents with more general estimates, such as from the collection? The model we propose leverages the distance between each relevant document and the set of nonrelevant documents by penalizing terms that occur frequently in the latter, similar to the intuitions described by Wang et al. (2008). Instead of subtracting probabilities, however, we take a more principled approach based on the Normalized Log Likelihood Ratio (NLLR). Their main findings are twofold: (1) in terms of statMAP, a larger number of (judged to be) nonrelevant documents improves retrieval effectiveness; and (2) on the TREC Terabyte topics, they can effectively replace the estimates on the (judged to be) nonrelevant documents with estimations on the document collection.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Nov 01, 2008
Accession Number: ADA512743

Entities

People

Edgar Meij
Jiyin He
Maarten De Rijke
Wouter Weerkamp

Organizations

University of Amsterdam

Incorporating Non-Relevance Information in the Estimation of Query Models

Abstract

Document Details

Entities

People

Organizations

Tags

DTIC Thesaurus Topics

Fields of Study

Readers