A Method for Correcting Broken Hyphenations in Noisy English Text

Abstract

The problem of rejoining broken hyphenations in processed English text is addressed. A basic algorithm is developed, which makes use of a word validation step. Results of running the algorithm over an English military training text is presented and analyzed. Precision and recall scores show that the algorithm works well for correcting broken hyphenations, but fails when certain types of noise are encountered in the data.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Apr 01, 2012
Accession Number
ADA561948

Entities

People

  • Jeffrey C. Micher

Organizations

  • United States Army Research Laboratory

Tags

DTIC Thesaurus Topics

  • Algorithms
  • Application Software
  • Arabic Language
  • Artificial Intelligence Software
  • Character Recognition
  • Data Analysis
  • Information Retrieval
  • Information Science
  • Language
  • Military Research
  • Military Training
  • Natural Language Processing
  • Natural Languages
  • Optical Character Recognition
  • Precision
  • Training
  • Validation

Readers

  • Computational Modeling and Simulation
  • Gulf War Illness and Chronic Multisymptom Illness in Veterans.
  • Speech Processing/Speech Recognition.