The Effect of Text Difficulty on Machine Translation Performance -- A Pilot Study with ILR-Rated Texts in Spanish, Farsi, Arabic, Russian and Korean

Abstract

We report on initial experiments that examine the relationship between automated measures of machine translation performance and the Interagency Language Roundtable (ILR) scale of language proficiency/difficulty that has been in standard use for U.S. government language training and assessment for the past several decades. The main question we ask is how technology-oriented measures of MT performance relate to the ILR difficulty levels, where we understand that a linguist with ILR proficiency level N is expected to be able to understand a document rated at level N, but to have increasing difficulty with documents at higher levels. In this paper, we find that some key aspects of MT performance track with ILR difficulty levels, primarily for MT output whose quality is good enough to be readable by human readers.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
May 01, 2004
Accession Number
ADA511696

Entities

People

  • Clifford Weinstein
  • Douglas Jones
  • Neil Granoien
  • Ray Clifford
  • Wade Shen

Tags

Communities of Interest

  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Applied Computer Science
  • Computational Linguistics
  • Computer Science
  • Foreign Languages
  • Language
  • Linguistics
  • Machine Translation
  • Natural Language Processing
  • Natural Languages
  • Newspapers
  • Pilot Studies
  • Standards
  • Statistics
  • Structural Components
  • Test And Evaluation
  • Translations

Readers

  • Aviation Science / Aeronautics.
  • Computational Linguistics

Technology Areas

  • AI & ML
  • AI & ML - Machine Translation