A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization.
Abstract
A probabilistic analysis of the Rocchio relevance feedback algorithm, one of the most popular learning methods from information retrieval, is presented in a text categorization framework. The analysis results in a probabilistic version of the Rocchio classifier and offers an explanation for the TFIDF word weighting heuristic. The Rocchio classifier, its probabilistic variant and a standard naive Bayes classifier are compared on three text categorization tasks. The results suggest that the probabilistic algorithms are preferable to the heuristic Rocchio classifier.
Document Details
- Document Type
- Technical Report
- Publication Date
- Mar 01, 1996
- Accession Number
- ADA307731
Entities
People
- Thorsten Joachims
Organizations
- Carnegie Mellon University