COMPUTER CLASSIFICATION OF INTELLIGENCE-TYPE DOCUMENTS.

Abstract

A computer classification technique was successfully tested on intelligence-type documents. Results of experiments are also reported on technical data bases in the English and German languages. Since the technique is statistical rather than syntactical it can classify documents in any language without requiring translation. In addition to the usual tests on sample and control data bases, a successful test was performed on another additional data base that had not been used to generate the classification statistics. The statistical technique is based upon multiple discriminant functions, which have the ability to classify into any number of categories, the technique provides for classification to several levels of detail. A user may select any set of subject categories suiting his need, and provides a set of sample documents for each category. A subset of words to form the classification bases are selected from the sample in accordance with their statistical properties. Classification applications are not limited to document retrieval, but may include document routing, screeening, or disseminating functions. (Author)

Document Details

Document Type
Technical Report
Publication Date
Sep 01, 1967
Accession Number
AD0820801

Entities

People

  • John H. Williams Jr.
  • Mathew P. Perriens

Organizations

  • International Business Machines Corporation (Armonk, NY)

Tags

DTIC Thesaurus Topics

  • Classification
  • Computers
  • Databases
  • German Language
  • Information Science
  • Language
  • Statistics
  • Translations
  • Words (Language)

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Library and Information Science
  • Regression Analysis.