A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization.

Abstract

A probabilistic analysis of the Rocchio relevance feedback algorithm, one of the most popular learning methods from information retrieval, is presented in a text categorization framework. The analysis results in a probabilistic version of the Rocchio classifier and offers an explanation for the TFIDF word weighting heuristic. The Rocchio classifier, its probabilistic variant and a standard naive Bayes classifier are compared on three text categorization tasks. The results suggest that the probabilistic algorithms are preferable to the heuristic Rocchio classifier.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Mar 01, 1996
Accession Number
ADA307731

Entities

People

  • Thorsten Joachims

Organizations

  • Carnegie Mellon University

Tags

DTIC Thesaurus Topics

  • Algorithms
  • Computational Processes
  • Computing-Related Activities
  • Feedback
  • Information Retrieval
  • Learning
  • Machine Learning
  • Standards

Fields of Study

  • Computer science

Readers

  • Artificial Intelligence

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval
  • AI & ML - Machine Learning Algorithms