BREAKING THE COST BARRIER IN AUTOMATIC CLASSIFICATION

Abstract

A low-cost automatic classification method is reported that uses computer time in proportion to NlogN, where N is the number of information items and the base is a parameter. Some barriers besides cost are treated briefly in the opening section, including types of intellectual resistance to the idea of doing classification by content-word similarity. The second section explains the basic processes of document grouping by similarity, and discusses the advantages of the reported method over methods commonly experimented with. The operation of an iterative procedure using word profiles to progressively improve the grouping of content-word lists is described. Then some possible applications aside from document classification are enumerated. The final section begins by presenting theoretical underpinnings that explain the form taken by the components of the method. An account of the struggle to make the method work is sketched, followed by a cycle-by-cycle description of a feasibility demonstration. The conclusion states that mere cheapness is not enough and analyzes what researchers and developers might have to do before user acceptance of automatic classification can be assured.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jul 01, 1966
Accession Number
AD0636837

Entities

People

  • L. B. Doyle

Organizations

  • System Development Corporation

Tags

Communities of Interest

  • Biomedical

DTIC Thesaurus Topics

  • Classification
  • Commerce
  • Computational Linguistics
  • Computers
  • Crime
  • Data Processing
  • Health Services
  • Human Behavior
  • Language
  • Libraries
  • Linguistics
  • Machine Translation
  • Natural Language Processing
  • Natural Languages
  • Psychology
  • Societies
  • Word Lists

Readers

  • Business Analytics
  • Systems Analysis and Design
  • Theoretical Analysis.