Segmentation of Touching English Letters.
Abstract
This paper examines the problem of building a machine to read uncontrolled type fonts set with essentially no space between letters (within words). The consequence of this type of data, which represents the usual format of printed text, is that the data vectors produced by the optical scanner contain multiple letters and/or fragments of letters that cannot be easily separated. An algorithm based on a variant of running cross-correlation between prototype letters and successively 'windowed' fragments of the sentence is employed. the algorithm computes the Euclidean distance between prototypes and the sentence fragment in a filtered Fourier domain. It is shown that appropriate normalizaton and windowing techniques allow perfect recognition of touching letters within words. This occurs even when no apriori knowledge of letter location within the word is available, provided that suitable prototypes can be established. Multiple alphabet prototypes were then built and used to examine widely differing type fonts. Techniques to set acceptance thresholds were evaluated and the behavior of the resulting recognition system tabulated. A number of false triggers did occur in this case and these were discussed. Recommendations for further improvements in the system are suggested. (Author)
Document Details
- Document Type
- Technical Report
- Publication Date
- Mar 01, 1979
- Accession Number
- ADA069298
Entities
People
- Roy Edward Bentkowski
Organizations
- Air Force Institute of Technology