Handling Translation Divergences: Combining Statistical and Symbolic Techniques in Generation-Heavy Machine Translation

Abstract

This paper describes a novel approach to handling translation divergences in a Generation-Heavy Hybrid Machine Translation (GHMT) system. The translation divergence problem is usually reserved for Transfer and Interlingual MT because it requires a large combination of complex lexical and structural mappings. A major requirement of these approaches is the accessibility of large amounts of explicit symmetrical knowledge for both source and target languages. This limitation renders Transfer and Interlingual approaches ineffective in the face of structurally-divergent language pairs with asymmetrical resources. GHMT addresses the more common form of this problem, source-poor/target-rich, by fully exploiting symbolic and statistical target-language resources. This is accomplished by using target-language lexical semantics, categorial variations and subcategorization frames to overgenerate multiple lexico-structural variations from a target-glossed syntactic dependency of the source-language sentence. The symbolic over-generation, which accounts for different possible translation divergences, is constrained by a statistical target-language model.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
May 01, 2002
Accession Number
ADA458925

Entities

People

  • Bonnie J. Dorr
  • Nizar Habash

Organizations

  • University of Maryland

Tags

DTIC Thesaurus Topics

  • Abstracts
  • Demographic Cohorts
  • Information Operations
  • Language
  • Linguistics
  • Machine Translation
  • Sequences
  • Social Sciences
  • Translations
  • Universities
  • Words (Language)

Fields of Study

  • Engineering

Readers

  • Applied Combinatorial Optimization and Logic Circuit Design.
  • Computational Linguistics

Technology Areas

  • AI & ML
  • AI & ML - Machine Translation
  • AI & ML - Neural Networks