AN INVESTIGATION OF THE USE OF CONTEXT IN CHARACTER RECOGNITION USING GRAPH SEARCHING,

Abstract

In the paper context is introduced into a character recognition system. The recognition system is considered to be reading a text rather than isolated characters. Two types of context are considered: the syntax, or formal grammar of the text under consideration, and the statistical distribution of the letters in the text. For syntax, a single 'regular' grammar is used. The statistical distribution is approximated by (a) the probabilities of individual letters, (b) the probabilities of letter-pairs (or diagrams), and (c) the probabilities of letter-triplets (or trigrams). For each character input, the character recognizer outputs a list of alternative decisions with their associated 'confidences'. Based on an entire input string, both types of context are then used to make the final decision, using a graph-searching procedure. Experiments were run on a computer with a simulated character recognizer. The results indicate that the graph-searching formulation of the problem does indeed allow the syntax and statistical distribution of the letters to be utilized. The error rate of the character recognition system is reduced when digram and trigram statistics are used; their effectiveness varies inversely as the uncertainty in the statistical distribution of the letters in the text. (Author)

Document Details

Document Type
Technical Report
Publication Date
Nov 01, 1968
Accession Number
AD0707548

Entities

People

  • Carl Spencer Christensen

Organizations

  • Cornell University

Tags

DTIC Thesaurus Topics

  • Character Recognition
  • Computers
  • Data Science
  • Information Science
  • Mathematics
  • Personality
  • Probability
  • Recognition
  • Statistical Distributions
  • Statistics
  • Uncertainty

Readers

  • Computational Linguistics
  • Speech Processing/Speech Recognition.
  • Statistical inference.