A Statistical Approach to the Generation of a Database for Evaluating OCR Software

Abstract

In this report, we consider a statistical approach to augment a limited database of ground truth documents for use in evaluating optical character recognition (OCR) software. We require ground truth documents to assign a performance measure to the OCR component of the Forward Area Language Converter (FALCon) system. A modified moving-blocks bootstrap procedure is used to construct surrogate documents for this purpose which prove to serve effectively, and in some regards, indistinguishably from ground truth. The proposed method is validated through a rigorous statistical procedure.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jul 01, 2000
Accession Number
ADA380805

Entities

People

  • Ann E. Brodeen
  • Frederick S. Brundick
  • Malcolm S. Taylor

Organizations

  • United States Army Research Laboratory

Tags

DTIC Thesaurus Topics

  • Accuracy
  • Character Recognition
  • Databases
  • Demographic Cohorts
  • Department Of Defense
  • Forward Areas
  • Identification
  • Information Retrieval
  • Language
  • Military Research
  • Probability
  • Random Variables
  • Recognition
  • Recreation
  • Sampling
  • Simulations
  • Test And Evaluation

Fields of Study

  • Computer science

Readers

  • Atmospheric Remote Sensing.
  • Computer Science/Computer Engineering/Data Science/Digital Signal Processing.
  • Regression Analysis.