Domain Adaptation of Translation Models for Multilingual Applications

Abstract

The performance of a statistical translation algorithm in the context of multilingual applications such as cross-lingual information retrieval (CLIR) and machine translation (MT) depends on the quality, quantity and proper domain matching of the training data. Traditionally, manual selection and customization of training resources has been the prevailing approach. In addition to being labor-intensive, this approach does not scale to the large quantity of heterogeneous resources that have recently become available, such as parallel text and bilingual thesauri in various domains. More importantly, manual customization does not offer a solution to efficiently and effectively producing tailored translation models for a mixture of heterogeneous target documents in various domains, topics, languages and genres. Translation models trained on a general domain do not work well in technical domains; models trained on written documents are not appropriate for spoken dialogue; models trained on manual transcripts can be sub-optimal for translating noisy transcripts produced by a speech recognizer; finally, models trained on a mixture of topics are not optimal for any of the topic-specific documents. We seek to address this challenge by automatically adapting translation models (and implicitly parallel training resources) to specific target domains or sub-domains.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Apr 01, 2009
Accession Number
ADA507147

Entities

People

  • Monica Rogati

Organizations

  • Carnegie Mellon University

Tags

Communities of Interest

  • Engineered Resilient Systems
  • Ground and Sea Platforms
  • Materials and Manufacturing Processes
  • Weapons Technologies

DTIC Thesaurus Topics

  • Algorithms
  • Analgesia
  • Aneurysm
  • Cardiac Arrhythmias
  • Cardiovascular Physiological Phenomena
  • Cardiovascular Surgery
  • Cardiovascular System
  • Computer Science
  • Computers
  • Data Sets
  • Health Services
  • Information Retrieval
  • Language
  • Myocardial Ischemia
  • Pain
  • Test Sets
  • Translations

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Distributed Systems and Data Platform Development

Technology Areas

  • AI & ML
  • AI & ML - Machine Learning Algorithms
  • AI & ML - Machine Translation
  • AI & ML - Neural Networks