Inter-Rater Agreement Measures and the Refinement of Metrics in the PLATO MT Evaluation Paradigm

Abstract

The PLATO machine translation (MT) evaluation (MTE) research program has as a goal the systematic development of a predictive relationship between discrete, well-defined MTE metrics and the specific information processing tasks that can be reliably performed with MT output. Traditional measures of quality, informed by International Standards for Language Engineering (ISLE), namely, clarity, coherence, morphology, syntax, general and domain-specific lexical robustness, and named-entity translation, as well as a DARPA-inspired measure of adequacy are at the core of the program. For robust validation, indispensable for refinement of test and guidelines, we conduct tests of inter-rater reliability on the assessments. Here we discuss development and report on results of our inter-rater reliability tests, focusing on results for Clarity and the Coherence, the first two assessments in the PLATO suite, and we discuss our method for iteratively refining our linguistic metrics and the guidelines for applying them within the PLATO evaluation paradigm. Finally, we discuss reasons why kappa might not be the best measure of inter-rater agreement for our purposes, and suggest directions for future investigation.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jul 01, 2005
Accession Number
ADA456393

Entities

People

  • Keith J. Miller
  • Michelle Vanni

Organizations

  • MITRE Corporation

Tags

DTIC Thesaurus Topics

  • Accuracy
  • Agreements
  • Automatic
  • Coefficients
  • Computational Linguistics
  • Computations
  • Data Analysis
  • Engineering
  • Information Processing
  • Judgment
  • Language
  • Linguistics
  • Machine Translation
  • Military Research
  • Reliability
  • Standards
  • Test And Evaluation

Readers

  • Computational Linguistics
  • Psychometric Testing or Psychological Assessment.
  • Software Engineering.

Technology Areas

  • AI & ML
  • AI & ML - Bayesian Inference
  • AI & ML - Machine Translation