Adaptive Hindi OCR Using Generalized Hausdorff Image Comparison

Abstract

In this paper, we present an adaptive Hindi OCR using generalized Hausdor image comparison implemented as part of a rapidly retargetable language tool report. The system includes: script identification, character segmentation, training sample creation and character recognition. The OCR design (completed in one month) was applied to a complete Hindi-English bilingual dictionary (with 1083 pages) and a collection of ideal images extracted from Hindi documents in PDF format. Experimental results show the recognition accuracy can reach 88% for noisy images and 95% for ideal images, both at the character level. The presented method can also be extended to design OCR systems for different scripts.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Aug 19, 2003
Accession Number
ADA455170

Entities

People

  • David S. Doermann
  • Huanfeng Ma

Organizations

  • University of Maryland

Tags

DTIC Thesaurus Topics

  • Accuracy
  • Aspect Ratio
  • Boundaries
  • Character Recognition
  • Classification
  • Computer Vision
  • Consonants
  • Dictionaries
  • Feature Extraction
  • Identification
  • Language
  • Machine Learning
  • Recognition
  • Symbols
  • Translations
  • Universities
  • Word Recognition

Readers

  • Computational Linguistics
  • Computer Vision.