XRL: Explainable Reinforcement Learning for AI Autonomy

Abstract

Understanding the decision of AI classifiers is fundamental to the reliable and robust application of ML methods across a wide variety of domains and end-uses. This report describes work on a specific area of interest conducted under the CMU XAI program, that of detecting and understanding the ability of adversaries to intentionally poison pre-trained classifiers with malicious triggers that allow them full control over the practical use of such systems. We show that by exploiting our developed XAI techniques, it is possible to reliably detect and avoid the use of such classifiers, or indeed to create triggers that are equally capable of breaking the systems. In addition, we present a broader survey of several different approaches to XAI methods, well beyond the scope of the classifier poisoning work, which was additionally developed throughout the course of the program.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Oct 01, 2021
Accession Number: AD1150499

Entities

People

J. Z. Kolter
Pradeep Ravikumar

Organizations

Carnegie Mellon University

XRL: Explainable Reinforcement Learning for AI Autonomy

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Fields of Study

Readers

Technology Areas