Handling Translation Divergences in Generation-Heavy Hybrid Machine Translation

Abstract

This paper describes a novel approach for handling translation divergences in a Generation- Heavy Hybrid Machine Translation (GHMT) system. The approach depends on the existence of rich target language resources such as word lexical semantics, including in- formation about categorical variations and subcategorization frames. These resources are used to generate multiple structural variations from a target-glossed lexico-syntactic representation of the source language sentence. The multiple structural variations account for different translation divergences. The overgeneration of the approach is constrained by a target-language model using corpus-based statistics. The exploitation of target language resources (symbolic and statistical) to handle a problem usually reserved to Transfer and Interlingual MT is useful for translation from structurally divergent source languages with scarce linguistic resources. A preliminary evaluation on the application of this approach to Spanish-English MT proves this approach extremely promising. The approach however is not limited to MT as it can be extended to monolingual NLG applications such as summarization.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Mar 01, 2002
Accession Number
ADA458778

Entities

People

  • Bonnie J. Dorr
  • Nizar Habash

Organizations

  • University of Maryland

Tags

Communities of Interest

  • C4I

DTIC Thesaurus Topics

  • Abstracts
  • Demographic Cohorts
  • Formal Languages
  • Information Operations
  • Language
  • Machine Translation
  • Sequences
  • Social Sciences
  • Translations
  • Universities
  • Words (Language)

Readers

  • Computational Linguistics
  • Systems Analysis and Design

Technology Areas

  • AI & ML
  • AI & ML - Bayesian Inference
  • AI & ML - Machine Translation