Definitions, methods, and applications in interpretable machine learning
Abstract
The recent surge in interpretability research has led to confusion on numerous fronts. In particular, it is unclear what it means to be interpretable and how to select, evaluate, or even discuss methods for producing interpretations of machine-learning models. We aim to clarify these concerns by defining interpretable machine learning and constructing a unifying framework for existing methods which highlights the underappreciated role played by human audiences. Within this framework, methods are organized into 2 classes: model based and post hoc. To provide guidance in selecting and evaluating interpretation methods, we introduce 3 desiderata: predictive accuracy, descriptive accuracy, and relevancy. Using our framework, we review existing work, grounded in real-world studies which exemplify our desiderata, and suggest directions for future work.
Document Details
- Document Type
- Pub Defense Publication
- Publication Date
- Oct 16, 2019
- Source ID
- 10.1073/pnas.1900654116
Entities
People
- Bin Yu
- Chandan Singh
- Karl Kumbier
- Reza Abbasi-asl
- W. James Murdoch
Organizations
- Allen Institute for Brain Science
- Army Research Office
- National Science Foundation
- Natural Sciences and Engineering Research Council
- Office of Naval Research
- University of California