DOCUMENT FORMAT RECOGNITION.

Abstract

This study is primarily concerned with methods for analyzing the format of pages from technical journals, and means for automatically processing the textual and graphic material on these pages for input to a computer which is to perform textual data processing functions, such as automatic language translation, automatic abstracting, automatic indexing, etc. This analysis and processing includes text-graphic separation, location of graphics, and textual analysis and recognition. The overall process is considered to be a Format Recognition and Analysis Program operating on a computer-controlled character recognition device. This study has resulted in general design techniques for Format Recognition and Analysis Programs applicable to any document which occurs with text and graphics intermixed. Two such programs have been completed, tested, and demonstrated for two technical journals, one Soviet and one U.S., and a third program has been outlined and partly written for another Soviet journal. It has been found that almost any journal can be programmed without serious difficulty, but new journals require substantially different programs. (Author)

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 1965
Accession Number
AD0611632

Entities

People

  • Steven B. Gray

Organizations

  • Sylvania Electric Products

Tags

DTIC Thesaurus Topics

  • Automatic
  • Character Recognition
  • Computers
  • Data Processing
  • Graphics
  • Language
  • Language Translation
  • Materials
  • Personality
  • Recognition
  • Translations

Readers

  • Computational Linguistics
  • Computer Science.
  • Political Science/ International Relations/ European Studies