Extending Phrase-Based Decoding with a Dependency-Based Reordering Model

Abstract

Phrase-based decoding is conceptually simple and straightforward to implement, at the cost of drastically oversimplified reordering models. Syntactically aware models make it possible to capture linguistically relevant relationships in order to improve word order, but they can be more complex to implement and optimize. In this paper, we explore a new middle ground between phrase-based and syntactically informed statistical MT, in the form of a model that supplements conventional, non-hierarchical phrase-based techniques with linguistically informed reordering based on syntactic dependency trees. The key idea is to exploit linguistically-informed hierarchical structures only for those dependencies that cannot be captured within a single flat phrase. For very local dependencies we leverage the success of conventional phrase-based approaches, which provide a sequence of target-language words appropriately ordered and ready-made with the appropriate agreement morphology. Working with dependency trees rather than constituency trees allows us to take advantage of the flexibility of phrase-based systems to treat non-constituent fragments as phrases. We do impose a requirement - that the fragment be a novel sort of "dependency constituent" - on what can be translated as a phrase, but this is much weaker than the requirement that phrases be traditional linguistic constituents, which has often proven too restrictive in MT systems.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Nov 01, 2009
Accession Number
ADA522961

Entities

People

  • Philip Resnik
  • Tim Hunter

Organizations

  • University of Maryland

Tags

Communities of Interest

  • C4I

DTIC Thesaurus Topics

  • Algorithms
  • Automated Speech Recognition
  • Computational Linguistics
  • Decoding
  • Information Processing
  • Language
  • Linguistics
  • Machine Translation
  • Mathematics
  • Natural Language Processing
  • Natural Languages
  • Notation
  • Probability
  • Probability Distributions
  • Resilience
  • Sequences
  • Test Sets

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Computational Modeling and Simulation