Inter-Rater Agreement Measures and the Refinement of Metrics in the PLATO MT Evaluation Paradigm

Abstract

The PLATO machine translation (MT) evaluation (MTE) research program has as a goal the systematic development of a predictive relationship between discrete, well-defined MTE metrics and the specific information processing tasks that can be reliably performed with MT output. Traditional measures of quality, informed by International Standards for Language Engineering (ISLE), namely, clarity, coherence, morphology, syntax, general and domain-specific lexical robustness, and named-entity translation, as well as a DARPA-inspired measure of adequacy are at the core of the program. For robust validation, indispensable for refinement of test and guidelines, we conduct tests of inter-rater reliability on the assessments. Here we discuss development and report on results of our inter-rater reliability tests, focusing on results for Clarity and the Coherence, the first two assessments in the PLATO suite, and we discuss our method for iteratively refining our linguistic metrics and the guidelines for applying them within the PLATO evaluation paradigm. Finally, we discuss reasons why kappa might not be the best measure of inter-rater agreement for our purposes, and suggest directions for future investigation.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Jul 01, 2005
Accession Number: ADA456393

Entities

People

Keith J. Miller
Michelle Vanni

Organizations

MITRE Corporation

Inter-Rater Agreement Measures and the Refinement of Metrics in the PLATO MT Evaluation Paradigm

Abstract

Document Details

Entities

People

Organizations

Tags

DTIC Thesaurus Topics

Readers

Technology Areas