Portable Language-Independent Adaptive Translation From OCR

Abstract

This is the sixth R&D quarterly progress report (QPR) of the BBN-led team under DARPA's MADCAT program. This report is organized by technical task area. The following tasks were performed this quarter: 1.1. Pre-Processing and Page Segmentation - Text Segmentation and Verification; Shape-DNA based Handwritten Text Line Detection; Text line detection and separation. 1.2. Text Recognition - Error Analysis; Training with Phase 2 Data; Unsupervised Scribe Adaptation; Named Entity Detection using Lattices. 1.4. Integration with GALE MT - Recognition Lattices for MT. 1.5. Metadata Extraction - Logo Recognition.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Apr 15, 2009
Accession Number
ADA499805

Entities

People

  • Prem Natarajan

Organizations

  • BBN Technologies

Tags

DTIC Thesaurus Topics

  • Computer Vision
  • Data Sets
  • Databases
  • Decoding
  • Department Of Defense
  • Detection
  • Error Analysis
  • Errors
  • Governments
  • Language
  • Machine Translation
  • Personality
  • Recognition
  • Technical Information Centers
  • Test Sets
  • Training
  • Translations

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Technical Research and Report Writing.