Strategies for Building a Farsi-English SMT System from Limited Resources

Abstract

One of the recent tasks for machine translation research has been development of translation capabilities in a time frame as short as 100 days. Such a task requires developers to consider what can be done with relatively small amounts of data in a small time frame. This inherently limits the type and complexity of the effort to be devoted to this task. In this paper we will focus on the kinds of improvements for a Farsi-to- English translation system achieved by means of algorithmic changes, adding raw, domain-unspecific resources, and unsupervised morphological segmentation. The cumulative effect of these measures has been an improvement in BLEU scores of about 25% relative on an internal test set.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Sep 01, 2008
Accession Number
ADA635746

Entities

People

  • Andreas Kathol
  • Jing Zheng

Organizations

  • SRI International

Tags

DTIC Thesaurus Topics

  • Analyzers
  • Computer Vision
  • Context Free Grammars
  • Data Sets
  • Dictionaries
  • Grammars
  • Information Operations
  • Language
  • Language Translation
  • Linguistics
  • Machine Translation
  • New Mexico
  • Social Sciences
  • Standards
  • Test Sets
  • Training
  • Translations

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Systems Analysis and Design

Technology Areas

  • AI & ML
  • AI & ML - Machine Translation