An Investigation of the Relationship Between Automated Machine Translation Evaluation Metrics and User Performance on an Information Extraction Task

Abstract

This dissertation applies nonparametric statistical techniques to Machine Translation (MT) Evaluation using data from a MT Evaluation experiment conducted through a joint Army Research Laboratory (ARL) and Center for the Advanced Study of Language (CASL) project. In particular, the relationship between humantask performance on an information extraction task with translated documents and well-known automated translation evaluation metric scores for those documents is studied. Findings from a correlation analysis of the connection between autometrics and task-based metrics are presented and contrasted with current strategies for evaluating translations. A novel idea for assessing partial rank correlation within the presence of grouping factors is also introduced. Lastly, this dissertation presents a framework for task-based machine translation (MT) evaluation and predictive modeling of task responses that gives new information about the relative predictive strengths of the different autometrics (and re-coded variants of them) within the statistical Generalized Linear Models developed in analyses of the Information Extraction Task data. This work shows that current autometrics are inadequate with respect to the prediction of task performance but, near adequacy can be accomplished through the use of re-coded autometrics in a logistic regression setting. As a result, a class of automated metrics that are best suitable for predicting performance is established and suggestions are offered about how to utilize metrics to supplement expensive and time-consuming experiments with human participants. Now users can begin to tie the intrinsic automated metrics to the extrinsic metrics for task they perform. The bottom line is that there is a need to average away MT dependence (averaged metrics perform better in overall predictions than original autometrics). Moreover, combinations of recoded metrics performed better than any individual metric.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2007
Accession Number
AD1007040

Entities

People

  • Calandra R. Tate

Organizations

  • University of Maryland

Tags

Communities of Interest

  • Autonomy
  • Biomedical
  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Accuracy
  • Computational Science
  • Data Analysis
  • Data Mining
  • Data Science
  • Information Processing
  • Information Science
  • Knowledge Management
  • Network Science
  • Predictive Modeling
  • Probability
  • Statistical Algorithms
  • Statistical Analysis
  • Statistics
  • Surveys
  • Task Performance And Analysis
  • Test And Evaluation

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Computational Modeling and Simulation

Technology Areas

  • AI & ML
  • AI & ML - Bayesian Inference
  • AI & ML - Information Retrieval
  • AI & ML - Machine Translation