XRL: Explainable Reinforcement Learning for AI Autonomy
Abstract
Understanding the decision of AI classifiers is fundamental to the reliable and robust application of ML methods across a wide variety of domains and end-uses. This report describes work on a specific area of interest conducted under the CMU XAI program, that of detecting and understanding the ability of adversaries to intentionally poison pre-trained classifiers with malicious triggers that allow them full control over the practical use of such systems. We show that by exploiting our developed XAI techniques, it is possible to reliably detect and avoid the use of such classifiers, or indeed to create triggers that are equally capable of breaking the systems. In addition, we present a broader survey of several different approaches to XAI methods, well beyond the scope of the classifier poisoning work, which was additionally developed throughout the course of the program.
Document Details
- Document Type
- Technical Report
- Publication Date
- Oct 01, 2021
- Accession Number
- AD1150499
Entities
People
- J. Z. Kolter
- Pradeep Ravikumar
Organizations
- Carnegie Mellon University