Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification

Abstract

Categorization of documents is challenging, as the number of discriminating words can be very large. The authors present a nearest neighbor classification scheme for text categorization in which the importance of discriminating words is learned using mutual information and weight adjustment techniques. The nearest neighbors for a particular document are then computed based on the matching words and their weights. They evaluate their scheme on both synthetic and real-world documents. Experiments with synthetic data sets show that this scheme is robust under different emulated conditions. Empirical results on real-world documents demonstrate that this scheme outperforms state-of-the-art classification algorithms such as C4.5, RIPPER, Rainbow, and PEBLS.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
May 17, 1999
Accession Number
ADA439688

Entities

People

  • Euihong Han
  • George Karypis
  • Vipin Kumar

Organizations

  • University of Minnesota

Tags

Communities of Interest

  • Autonomy
  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Algorithms
  • Classification
  • Computational Complexity
  • Computer Science
  • Data Mining
  • Data Sets
  • Feature Selection
  • Information Retrieval
  • Information Science
  • Information Systems
  • Information Theory
  • Judgment
  • Machine Learning
  • Multiplication Factor
  • Network Science
  • Probability
  • Test Sets

Fields of Study

  • Computer science

Readers

  • Neural Network Machine Learning.
  • Speech Processing/Speech Recognition.