Principles for Evaluation of AI/ML Model Performance and Robustness

Abstract

The Department of Defense (DoD) has significantly increased its investment in the design, evaluation, and deployment of Artificial Intelligence and Machine Learning (AI/ML) capabilities to address national security needs [1, 2]. While there are numerous AI/ML successes in the academic and commercial sectors, many of these systems have also been shown to be brittle and nonrobust [3]. In a complex and ever-changing national security environment, it is vital that the DoD establish a sound and methodical process to evaluate the performance and robustness of AI/ML models before these new capabilities are deployed to the field [4].

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Mar 30, 2021
Accession Number
AD1147808

Entities

People

  • A.b. Curtis
  • J.a. Goodwin
  • O.m. Brown

Organizations

  • Massachusetts Institute of Technology

Tags

Communities of Interest

  • Human Systems

DTIC Thesaurus Topics

  • Air Force
  • Artificial Intelligence
  • Best Practices
  • Deep Learning
  • Defense Industry
  • Department Of Defense
  • Deployment
  • Governments
  • Information Science
  • Language
  • Machine Learning
  • Measurement
  • National Security
  • Neural Networks
  • Noise
  • Reliability
  • Sampling
  • Security
  • Standards
  • Test And Evaluation
  • Test Sets
  • Training
  • Validation

Readers

  • Defense Acquisition Program Management
  • Geospatial Intelligence and Artificial Intelligence Analytics
  • Systems Analysis and Design

Technology Areas

  • AI & ML
  • AI & ML - DoD AI Strategy
  • AI & ML - Neural Networks