IS AUTOMATIC CLASSIFICATION A REASONABLE APPLICATION OF STATISTICAL ANALYSIS OF TEXT.

Abstract

The crucial question of the quality of automatic classification is treated at considerable length, and empirical data are introduced to support the hypothesis that classification quality improves as more information about each document is used for input to the classification program. Six non-judgmental criteria are used in testing the hypothesis for 100 keyboard lists (each list representing a document) for a series of computer runs in which the number of words per document is increased progressively from 12 to 36. Four of the six criteria indicate the hypothesis holds, and two point to no effect. Finally, the future of automatic classification and some of the practical problems to be faced are outlined.

Document Details

Document Type
Technical Report
Publication Date
Aug 31, 1964
Accession Number
AD0608574

Entities

People

  • Lauren B. Doyle

Organizations

  • System Development Corporation

Tags

DTIC Thesaurus Topics

  • Automatic
  • Classification
  • Computers
  • Data Science
  • Information Science
  • Keyboards
  • Statistical Analysis

Readers

  • Speech Processing/Speech Recognition.
  • Systems Analysis and Design