Morphological Analysis for Statistical Machine Translation

Abstract

We present a novel morphological analysis technique which induces a morphological and syntactic symmetry between two languages with highly asymmetrical morphological structures to improve statistical machine translation qualities. The technique pre-supposes fine-grained segmentation of a word in the morphologically rich language into the sequence of prefix(es)-stem-suffix(es) and part-of-speech tagging of the parallel corpus. The algorithm identifies morphemes to be merged or deleted in the morphologically rich language to induce the desired morphological and syntactic symmetry. The technique improves Arabic-to-English translation qualities significantly when applied to IBM Model 1 and Phrase Translation Models trained on the training corpus size ranging from 3,500 to 3.3 million sentence pairs.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2004
Accession Number
ADA460276

Entities

People

  • Young-suk Lee

Organizations

  • IBM Thomas J. Watson Research Center

Tags

DTIC Thesaurus Topics

  • Abstracts
  • Applied Computer Science
  • Channel Models
  • Computational Science
  • Computer Vision
  • Decoding
  • Information Operations
  • Language
  • Linguistics
  • Machine Translation
  • Morphology (Linguistics)
  • Probability
  • Red Sea
  • Segmented
  • Symbols
  • Training
  • Translations

Readers

  • Computational Linguistics

Technology Areas

  • AI & ML
  • AI & ML - Machine Translation