Statistical Phrase-Based Translation

Abstract

We propose a new phrase-based translation model and decoding algorithm that enables us to evaluate and compare several, previously proposed phrase-based translation models. Within our framework, we carry out a large number of experiments to understand better and explain why phase-based models out-performed word-based models. Our empirical results, which hold for all examined language pairs, suggest that the highest levels of performance can be obtained through relatively simple means: heuristic learning of phrase translations from word-based alignments and lexical weighting of phrase translations. Surprisingly, learning phrases longer than three words and learning phrases from high-accuracy word-level alignment models does not have a strong impact on performance. Learning only syntactically motivated phrases degrades the performance of our systems.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2003
Accession Number
ADA461156

Entities

People

  • Daniel Marcu
  • Franz J. Och
  • Philipp Koehn

Organizations

  • University of Southern California

Tags

DTIC Thesaurus Topics

  • Algorithms
  • Automated Speech Recognition
  • Computational Linguistics
  • Computer Programming
  • Computer Science
  • Context Free Grammars
  • Cost Estimates
  • Decoding
  • Language
  • Linguistics
  • Machine Translation
  • Natural Language Processing
  • Probability
  • Probability Distributions
  • Sequences
  • Syntax
  • Translations

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Computational Modeling and Simulation