A Simple Probabilistic Approach to Classification and Routing

Abstract

Several classification and routing methods were implemented and compared. The experiments used FBIS documents from four categories, and the measures used were the tf.idf and Cosine similarity measures, and a maximum likelihood estimate based on assuming a Multinomial Distribution for the various topics (populations). In addition, the SMART program was run with 'lnc.ltc' weighting and compared to the others. Decisions for both our classification scheme (documents are put into any number of disjoint categories) and our routing scheme (documents are assigned a 'score' and ranked relative to each category) are based on the highest probability for correct classification or routing. All of the techniques described here are fully automatic, and use a training set of relevant documents to produce lists of distinguishing terms and weights. All methods (ours and the ones we compared to) gave excellent results for the classification task, while the one based on the Multinomial Distribution produced the best results on the routing task.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
May 01, 1996
Accession Number
ADA631838

Entities

People

  • James Leistensnider
  • Louise Guthrie

Organizations

  • Lockheed Martin

Tags

DTIC Thesaurus Topics

  • Abstracts
  • Arms Control
  • Classification
  • Equations
  • Frequency
  • Information Operations
  • Language
  • Mathematical Models
  • Mathematics
  • Models
  • Probability
  • Stemming
  • Training
  • United States
  • Ussr
  • Word Lists

Readers

  • Computer Networking
  • Instructional Design and Training Evaluation.
  • Statistical inference.