Testing the Accuracy of Automated Classification Systems Using Only Expert Ratings That Are Less Accurate Than the System

Abstract

A method is presented to estimate the accuracy of automated classification systems using only expert ratings that may be substantially less accurate than the systems being evaluated. The estimation method begins with multiple expert ratings on test cases, uses the level of inter-rater agreement to estimate rater accuracy, uses Bayesian updating based on estimated rater accuracy to estimate a ground truth probability for each classification, and then estimates system accuracy by comparing the relative frequency that the system agrees with the most probable classification at different probability levels. A simulation analysis provides evidence that the method is robust and yields reasonable estimates of system accuracy under diverse and predictable conditions.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2014
Accession Number
AD1107783

Entities

People

  • Paul Lehner

Organizations

  • MITRE Corporation

Tags

Communities of Interest

  • Biomedical

DTIC Thesaurus Topics

  • Accuracy
  • Agreements
  • Classification
  • Data Set
  • Data Sets
  • Delphi Method
  • Digital Data
  • Errors
  • Frequency
  • Judgment
  • Linear Programming
  • Natural Languages
  • Neurobehavioral Manifestations
  • Optimization
  • Probability
  • Probability Distributions
  • Ratings
  • Simulations
  • Standards
  • Statistical Tests
  • Statistics
  • Test And Evaluation

Fields of Study

  • Computer science

Readers

  • Computer Vision.
  • Psychometric Testing or Psychological Assessment.
  • Statistical inference.

Technology Areas

  • AI & ML
  • AI & ML - Bayesian Inference