Improving Statistical Machine Translation Through N-best List Re-ranking and Optimization

Abstract

Statistical machine translation (SMT) is a method of translating from one natural language (NL) to another using statistical models generated from examples of the NLs. The quality of translation generated by SMT systems is competitive with other premiere machine translation (MT) systems and more improvements can be made. This thesis focuses on improving the quality of translation by re-ranking the n-best lists that are generated by modern phrase-based SMT systems. The n-best lists represent the n most likely translations of a sentence. The research establishes upper and lower limits of the translation quality achievable through re-ranking. Three methods of generating an n-gram language model (LM) from the n-best lists are proposed. Applying the LMs to re-ranking the n-best lists results in improvements of up to six percent in the Bi-Lingual Evaluation Understudy (BLEU) score of the translation.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Mar 27, 2014
Accession Number
ADA598653

Entities

People

  • Jordan S. Keefer

Organizations

  • Air Force Institute of Technology

Tags

Communities of Interest

  • Biomedical
  • C4I
  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Air Force
  • Air Force Research Laboratories
  • Algorithms
  • Artificial Intelligence
  • Computational Science
  • Computers
  • Department Of Defense
  • Grammars
  • Information Operations
  • Language
  • Linguistics
  • Machine Translation
  • Natural Language Processing
  • Operating Systems
  • Trees (Data Structures)
  • United States
  • Word Processors

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Systems Analysis and Design

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval
  • AI & ML - Machine Translation
  • AI & ML - Neural Networks