IS AUTOMATIC CLASSIFICATION A REASONABLE APPLICATION OF STATISTICAL ANALYSIS OF TEXT.
Abstract
The crucial question of the quality of automatic classification is treated at considerable length, and empirical data are introduced to support the hypothesis that classification quality improves as more information about each document is used for input to the classification program. Six non-judgmental criteria are used in testing the hypothesis for 100 keyboard lists (each list representing a document) for a series of computer runs in which the number of words per document is increased progressively from 12 to 36. Four of the six criteria indicate the hypothesis holds, and two point to no effect. Finally, the future of automatic classification and some of the practical problems to be faced are outlined.
Document Details
- Document Type
- Technical Report
- Publication Date
- Aug 31, 1964
- Accession Number
- AD0608574
Entities
People
- Lauren B. Doyle
Organizations
- System Development Corporation