Normalization for Automated Metrics: English and Arabic Speech Translation
Abstract
The Defense Advanced Research Projects Agency (DARPA) Spoken Language Communication and Translation System for Tactical Use (TRANSTAC) program has experimented with applying automated me-rics to speech translation dialogues. For translations into English, BLEU, TER, and METEOR scores correlate well with human judgments, but scores for translation into Arabic correlate with human judgments less strongly. This paper provides evidence to sup-port the hypothesis that automated measures of Arabic are lower due to variation and in-flection in Arabic by demonstrating that normalization operations improve correlation between BLEU scores and Likert-type judgments of semantic adequacy as well as be-tween BLEU scores and human judgments of the successful transfer of the meaning of individual content words from English to Arabic.
Document Details
- Document Type
- Technical Report
- Publication Date
- Jan 01, 2009
- Accession Number
- AD1125098
Entities
People
- Alan Rubenstein
- Beatrice Oshika
- Christy Doran
- Dan Parvaz
- Gregory A. Sanders
- John Aberdeen
- Sherri Condon
Organizations
- MITRE Corporation