Learning Domain-Specific Transfer Rules: An Experiment with Korean to English Translation

Abstract

We describe the design of an MT system that employs transfer rules induced from parsed bitexts and present evaluation results. The system learns lexico-structural transfer rules using syntactic pattern matching, statistical co-occurrence and errordriven filtering. In an experiment with domainspecific Korean to English translation, the approach yielded substantial improvements over three baseline systems. In this paper, we describe the design of an MT system that employs transfer rules induced from parsed bitexts and present evaluation results for Korean to English translation. Our approach is based on lexico-structural transfer (Nasr et. al., 1997), and extends recent work reported in (Han et al., 2000) about Korean to English transfer in particular. Whereas Han et al. focus on high quality domainspecific translation using handcrafted transfer rules, in this work we instead focus on automating the acquisition of such rules. The proposed approach is inspired by example based machine translation (EBMT; Nagao, 1984; Sato and Nagao, 1990; Maruyama and Watanabe, 1992) and is similar to the recent works of (Meyers et al., 1998) and (Richardson et al., 2001) where transfer rules are also derived after aligning the source and target nodes of corresponding parses. However, while (Meyers et al., 1998) and (Richardson et al., 2001) only consider parses and rules with lexical labels and syntactic roles, our approach uses parses containing any syntactic information provided by parsers (lexical labels, syntactic roles, tense, number, person, etc.), and derives rules consisting of any source and target tree sub-patterns matching a subset of the parse features. A more detailed description of the differences can be found in (Lavoie et. al., 2001).

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2002
Accession Number
ADA457732

Entities

People

  • Benoit Lavoie
  • Michael J. White
  • Tanya Korelsky

Tags

Communities of Interest

  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Accuracy
  • Algorithms
  • Automatic
  • Computational Linguistics
  • Decoding
  • Dictionaries
  • Errors
  • Language
  • Learning
  • Linguistics
  • Machine Translation
  • Natural Language Processing
  • Natural Languages
  • Precision
  • Test And Evaluation
  • Test Sets
  • Translations

Fields of Study

  • Computer science

Readers

  • Military History
  • Neural Network Machine Learning.
  • Speech Processing/Speech Recognition.

Technology Areas

  • AI & ML
  • AI & ML - Machine Translation
  • AI & ML - Neural Networks