Improved Word-Level Alignment: Injecting Knowledge about MT Divergences

Abstract

Word-level alignments of bilingual text (bitexts) are not an integral part of statistical machine translation models, but also useful for lexical acquisition, treebank construction. and part-of-speech tagging. The frequent occurrence of divergences, structural differences between languages, presents a great challenge to the alignment task. We resolve some of the most prevalent divergence cases by using syntactic parse information to transform the sentence structure of one language to bear a closer resemblance to that of the other language. In this paper, we show that common divergence types can be found in multiple language pairs (in particular, we focus on English-Spanish and English-Arabic) and systematically identified. We describe our techniques for modifying English parse trees to form resulting sentences that share more similarity with the sentences in the other languages; finally, we present an empirical analysis comparing the complexities of performing word-level alignments with an without divergence handling. Our results suggest that divergence-handling can improve word-level alignment.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Feb 14, 2002
Accession Number
ADA458774

Entities

People

  • Bonnie J. Dorr
  • Lisa Pearl
  • Nizar Habash
  • Rebecca Hwa

Organizations

  • University of Maryland

Tags

Communities of Interest

  • C4I

DTIC Thesaurus Topics

  • Abstracts
  • Acquisition
  • Artificial Intelligence
  • Availability
  • Classification
  • Computers
  • Construction
  • Contracts
  • Formal Languages
  • Information Operations
  • Instructions
  • Integrals
  • Language
  • Machine Translation
  • Maryland
  • Monitoring
  • Universities

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Systems Analysis and Design

Technology Areas

  • AI & ML
  • AI & ML - Machine Translation