A Statistical Approach to the Generation of a Database for Evaluating OCR Software
Abstract
In this report, we consider a statistical approach to augment a limited database of ground truth documents for use in evaluating optical character recognition (OCR) software. We require ground truth documents to assign a performance measure to the OCR component of the Forward Area Language Converter (FALCon) system. A modified moving-blocks bootstrap procedure is used to construct surrogate documents for this purpose which prove to serve effectively, and in some regards, indistinguishably from ground truth. The proposed method is validated through a rigorous statistical procedure.
Document Details
- Document Type
- Technical Report
- Publication Date
- Jul 01, 2000
- Accession Number
- ADA380805
Entities
People
- Ann E. Brodeen
- Frederick S. Brundick
- Malcolm S. Taylor
Organizations
- United States Army Research Laboratory