Definitions, methods, and applications in interpretable machine learning

Abstract

The recent surge in interpretability research has led to confusion on numerous fronts. In particular, it is unclear what it means to be interpretable and how to select, evaluate, or even discuss methods for producing interpretations of machine-learning models. We aim to clarify these concerns by defining interpretable machine learning and constructing a unifying framework for existing methods which highlights the underappreciated role played by human audiences. Within this framework, methods are organized into 2 classes: model based and post hoc. To provide guidance in selecting and evaluating interpretation methods, we introduce 3 desiderata: predictive accuracy, descriptive accuracy, and relevancy. Using our framework, we review existing work, grounded in real-world studies which exemplify our desiderata, and suggest directions for future work.

Document Details

Document Type: Pub Defense Publication
Publication Date: Oct 16, 2019
Source ID: 10.1073/pnas.1900654116

Entities

People

Bin Yu
Chandan Singh
Karl Kumbier
Reza Abbasi-asl
W. James Murdoch

Organizations

Allen Institute for Brain Science
Army Research Office
National Science Foundation
Natural Sciences and Engineering Research Council
Office of Naval Research
University of California

Definitions, methods, and applications in interpretable machine learning

Abstract

Document Details

Entities

People

Organizations

Tags

Fields of Study

Readers

Technology Areas