A Method for the Removal of Redundancy in Printed Text.

Abstract

A class of methods for redundancy removal from printed texts, called ID-methods was developed. ID-methods take into account only the statistics associated with word occurrences in printed text. However, it has been shown by means of models that these methods can be used to encode English text at a cost as low as 1.5 binary digits per character. This figure compares favorably with Shannon's upper bound on the entropy of printed English, which was determined by an experiment that implicitly took into account the syntactic structure and the semantics of English. Shannon's bound was 1.3 bit per character. An encoding experiment was performed, which verified the cost predictions and assessed the complexity of using ID-methods. It was found that text could be encoded at a rate that was on the order of a few thousand characters per second. An analysis indicates that text encoded using an ID-method could be decoded at a rate of 250,000 characters per second on a computer such as the IBM 360/75. (Author)

Document Details

Document Type: Technical Report
Publication Date: Sep 01, 1972
Accession Number: AD0751407

Entities

People

Robert Donald Cullum

Organizations

University of Illinois Urbana–Champaign

A Method for the Removal of Redundancy in Printed Text.

Abstract

Document Details

Entities

People

Organizations

Tags

DTIC Thesaurus Topics

Readers