Incorporating Non-Relevance Information in the Estimation of Query Models
Abstract
The authors describe the participation of the University of Amsterdam's Information and Language Processing Systems (ILPS) group in the Relevance Feedback track at TREC 2008. They introduce a new model which incorporates information from relevant and nonrelevant documents to improve the estimation of query models. The study attempts to answer three research questions. First, can nonrelevance information be effectively modeled to improve the estimation of a query model? Second, given our model, what is the effect of the relative size of the set of nonrelevant documents with respect to the relevant documents on retrieval effectiveness? And, third, we ask the question whether and when explicit nonrelevance information helps. In other words, what are the effects when we substitute the estimates on the nonrelevant documents with more general estimates, such as from the collection? The model we propose leverages the distance between each relevant document and the set of nonrelevant documents by penalizing terms that occur frequently in the latter, similar to the intuitions described by Wang et al. (2008). Instead of subtracting probabilities, however, we take a more principled approach based on the Normalized Log Likelihood Ratio (NLLR). Their main findings are twofold: (1) in terms of statMAP, a larger number of (judged to be) nonrelevant documents improves retrieval effectiveness; and (2) on the TREC Terabyte topics, they can effectively replace the estimates on the (judged to be) nonrelevant documents with estimations on the document collection.
Document Details
- Document Type
- Technical Report
- Publication Date
- Nov 01, 2008
- Accession Number
- ADA512743
Entities
People
- Edgar Meij
- Jiyin He
- Maarten De Rijke
- Wouter Weerkamp
Organizations
- University of Amsterdam