A Unigram Orientation Model for Statistical Machine Translation

Abstract

In this paper, we present a unigram segmentation model for statistical machine translation where the segmentation units are blocks: pairs of phrases without internal structure. The segmentation model uses a novel orientation component to handle swapping of neighbor blocks. During training, we collect block unigram counts with orientation: we count how often a block occurs to the left or to the right of some predecessor block. The orientation model is shown to improve translation performance over two models: 1) no block re-ordering is used, and 2) the block swapping is controlled only by a language model. We show experimental results on a standard Arabic-English translation task.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2004
Accession Number
ADA460258

Entities

People

  • Christopher Tillmann

Organizations

  • IBM Thomas J. Watson Research Center

Tags

Communities of Interest

  • Air Platforms

DTIC Thesaurus Topics

  • Applied Computer Science
  • Computational Linguistics
  • Computational Science
  • Computer Vision
  • Decoding
  • Language
  • Linguistics
  • Machine Translation
  • Natural Language Processing
  • Natural Languages
  • Orientation (Direction)
  • Probability
  • Sequences
  • Test Sets
  • Training
  • Translations

Fields of Study

  • Computer science
  • Mathematics

Readers

  • Computational Linguistics
  • Computer Vision.
  • Materials Science and Engineering.

Technology Areas

  • AI & ML
  • AI & ML - Machine Translation
  • AI & ML - Neural Networks