A Method for the Removal of Redundancy in Printed Text.
Abstract
A class of methods for redundancy removal from printed texts, called ID-methods was developed. ID-methods take into account only the statistics associated with word occurrences in printed text. However, it has been shown by means of models that these methods can be used to encode English text at a cost as low as 1.5 binary digits per character. This figure compares favorably with Shannon's upper bound on the entropy of printed English, which was determined by an experiment that implicitly took into account the syntactic structure and the semantics of English. Shannon's bound was 1.3 bit per character. An encoding experiment was performed, which verified the cost predictions and assessed the complexity of using ID-methods. It was found that text could be encoded at a rate that was on the order of a few thousand characters per second. An analysis indicates that text encoded using an ID-method could be decoded at a rate of 250,000 characters per second on a computer such as the IBM 360/75. (Author)
Document Details
- Document Type
- Technical Report
- Publication Date
- Sep 01, 1972
- Accession Number
- AD0751407
Entities
People
- Robert Donald Cullum
Organizations
- University of Illinois Urbana–Champaign