Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification
Abstract
Categorization of documents is challenging, as the number of discriminating words can be very large. The authors present a nearest neighbor classification scheme for text categorization in which the importance of discriminating words is learned using mutual information and weight adjustment techniques. The nearest neighbors for a particular document are then computed based on the matching words and their weights. They evaluate their scheme on both synthetic and real-world documents. Experiments with synthetic data sets show that this scheme is robust under different emulated conditions. Empirical results on real-world documents demonstrate that this scheme outperforms state-of-the-art classification algorithms such as C4.5, RIPPER, Rainbow, and PEBLS.
Document Details
- Document Type
- Technical Report
- Publication Date
- May 17, 1999
- Accession Number
- ADA439688
Entities
People
- Euihong Han
- George Karypis
- Vipin Kumar
Organizations
- University of Minnesota