A Generative Theory of Relevance

Abstract

We present a new theory of relevance for the field of Information Retrieval. Relevance is viewed as a generative process and we hypothesize that both user queries and relevant documents represent random observations from that process. Based on this view we develop a formal retrieval model that has direct applications to a wide range of search scenarios. The new model substantially outperforms strong baselines on the tasks of ad-hoc retrieval, cross-language retrieval, handwriting retrieval, automatic image annotation, video retrieval and topic detection and tracking. Empirical success of our approach is due to a new technique we propose for modeling exchangeable sequences of discrete random variable. The new technique represents an attractive counterpart to existing formulations, such as multinomial mixtures, pLSI and LDA:it is effective, easy to train, and makes no assumptions about the geometric structure of the data.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Sep 01, 2004
Accession Number
ADA440135

Entities

People

  • Victor Lavrenko

Organizations

  • University of Massachusetts Amherst

Tags

Communities of Interest

  • C4I
  • Human Systems

DTIC Thesaurus Topics

  • Algorithms
  • Automated Speech Recognition
  • Computational Complexity
  • Computational Science
  • Data Mining
  • Databases
  • Dimensionality Reduction
  • Information Retrieval
  • Information Science
  • Language
  • Machine Translation
  • Natural Language Processing
  • Predictive Modeling
  • Probabilistic Models
  • Probability
  • Probability Distributions
  • Random Variables

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Computer Vision.
  • Statistical inference.

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval
  • AI & ML - Neural Networks