The Bias Problem and Language Models in Adaptive Filtering

Abstract

We used the YFILTER filtering system for experiments on updating profiles and setting thresholds. We developed a new method of using language models for updating profiles that is more focused on picking informative/discriminative words for query. The new method was compared with the well-known Rocchio algorithm. Dissemination thresholds were set based on maximum likelihood estimation that models and compensates for the sampling bias inherent in adaptive filtering. Our experimental results suggest that using what kind of distribution to model the scores of relevant and non-relevant documents is corpus dependant. The experimental results also show the sampling bias problem of training data while filtering makes the final profile learned biased.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2006
Accession Number
ADA456239

Entities

People

  • Jamie Callan
  • Yi Zhang

Organizations

  • Carnegie Mellon University

Tags

DTIC Thesaurus Topics

  • Abstracts
  • Air Force Research Laboratories
  • Algorithms
  • Automated Speech Recognition
  • Computer Science
  • English Language
  • Feedback
  • Filtration
  • Information Retrieval
  • Information Science
  • Knowledge Management
  • Language
  • Learning
  • Maximum Likelihood Estimation
  • Natural Languages
  • Sampling
  • Training

Readers

  • Adaptive Control and Estimation with Uncertainty in Dynamic Systems.
  • Information Retrieval
  • Regression Analysis.