Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification

Abstract

Categorization of documents is challenging, as the number of discriminating words can be very large. The authors present a nearest neighbor classification scheme for text categorization in which the importance of discriminating words is learned using mutual information and weight adjustment techniques. The nearest neighbors for a particular document are then computed based on the matching words and their weights. They evaluate their scheme on both synthetic and real-world documents. Experiments with synthetic data sets show that this scheme is robust under different emulated conditions. Empirical results on real-world documents demonstrate that this scheme outperforms state-of-the-art classification algorithms such as C4.5, RIPPER, Rainbow, and PEBLS.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: May 17, 1999
Accession Number: ADA439688

Entities

People

Euihong Han
George Karypis
Vipin Kumar

Organizations

University of Minnesota

Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Fields of Study

Readers