A Simple Probabilistic Approach to Classification and Routing
Abstract
Several classification and routing methods were implemented and compared. The experiments used FBIS documents from four categories, and the measures used were the tf.idf and Cosine similarity measures, and a maximum likelihood estimate based on assuming a Multinomial Distribution for the various topics (populations). In addition, the SMART program was run with 'lnc.ltc' weighting and compared to the others. Decisions for both our classification scheme (documents are put into any number of disjoint categories) and our routing scheme (documents are assigned a 'score' and ranked relative to each category) are based on the highest probability for correct classification or routing. All of the techniques described here are fully automatic, and use a training set of relevant documents to produce lists of distinguishing terms and weights. All methods (ours and the ones we compared to) gave excellent results for the classification task, while the one based on the Multinomial Distribution produced the best results on the routing task.
Document Details
- Document Type
- Technical Report
- Publication Date
- May 01, 1996
- Accession Number
- ADA631838
Entities
People
- James Leistensnider
- Louise Guthrie
Organizations
- Lockheed Martin