XRL: Explainable Reinforcement Learning for AI Autonomy

Abstract

Understanding the decision of AI classifiers is fundamental to the reliable and robust application of ML methods across a wide variety of domains and end-uses. This report describes work on a specific area of interest conducted under the CMU XAI program, that of detecting and understanding the ability of adversaries to intentionally poison pre-trained classifiers with malicious triggers that allow them full control over the practical use of such systems. We show that by exploiting our developed XAI techniques, it is possible to reliably detect and avoid the use of such classifiers, or indeed to create triggers that are equally capable of breaking the systems. In addition, we present a broader survey of several different approaches to XAI methods, well beyond the scope of the classifier poisoning work, which was additionally developed throughout the course of the program.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Oct 01, 2021
Accession Number
AD1150499

Entities

People

  • J. Z. Kolter
  • Pradeep Ravikumar

Organizations

  • Carnegie Mellon University

Tags

Communities of Interest

  • Autonomy
  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Air Force
  • Air Force Research Laboratories
  • Algorithms
  • Artificial Intelligence
  • Artificial Intelligence Software
  • Computer Vision
  • Data Sets
  • Deep Learning
  • Dimensionality Reduction
  • Game Theory
  • Governments
  • Image Processing
  • Image Recognition
  • Information Processing
  • Information Science
  • Information Systems
  • Kernel Functions
  • Machine Learning
  • Neural Networks
  • Operations Research
  • Probability
  • Signal Processing
  • Standards
  • Training

Fields of Study

  • Computer science

Readers

  • Educational Psychology
  • Neural Network Machine Learning.

Technology Areas

  • AI & ML
  • AI & ML - DoD AI Strategy
  • AI & ML - Neural Networks