Normalization for Automated Metrics: English and Arabic Speech Translation

Abstract

The Defense Advanced Research Projects Agency (DARPA) Spoken Language Communication and Translation System for Tactical Use (TRANSTAC) program has experimented with applying automated me-rics to speech translation dialogues. For translations into English, BLEU, TER, and METEOR scores correlate well with human judgments, but scores for translation into Arabic correlate with human judgments less strongly. This paper provides evidence to sup-port the hypothesis that automated measures of Arabic are lower due to variation and in-flection in Arabic by demonstrating that normalization operations improve correlation between BLEU scores and Likert-type judgments of semantic adequacy as well as be-tween BLEU scores and human judgments of the successful transfer of the meaning of individual content words from English to Arabic.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2009
Accession Number
AD1125098

Entities

People

  • Alan Rubenstein
  • Beatrice Oshika
  • Christy Doran
  • Dan Parvaz
  • Gregory A. Sanders
  • John Aberdeen
  • Sherri Condon

Organizations

  • MITRE Corporation

Tags

Communities of Interest

  • Human Systems

DTIC Thesaurus Topics

  • Asymmetry
  • Automated Speech Recognition
  • Automatic
  • Contrast
  • Corporations
  • Information Retrieval
  • Judgment
  • Language
  • Language Translation
  • Machine Translation
  • Medical Screening
  • Military Personnel
  • Morphology (Linguistics)
  • Natural Language Processing
  • Personality
  • Standards
  • Stemming
  • Test And Evaluation
  • Training
  • Translations

Readers

  • Computational Linguistics
  • Psychometric Testing or Psychological Assessment.