Definitions, methods, and applications in interpretable machine learning

Abstract

The recent surge in interpretability research has led to confusion on numerous fronts. In particular, it is unclear what it means to be interpretable and how to select, evaluate, or even discuss methods for producing interpretations of machine-learning models. We aim to clarify these concerns by defining interpretable machine learning and constructing a unifying framework for existing methods which highlights the underappreciated role played by human audiences. Within this framework, methods are organized into 2 classes: model based and post hoc. To provide guidance in selecting and evaluating interpretation methods, we introduce 3 desiderata: predictive accuracy, descriptive accuracy, and relevancy. Using our framework, we review existing work, grounded in real-world studies which exemplify our desiderata, and suggest directions for future work.

Document Details

Document Type
Pub Defense Publication
Publication Date
Oct 16, 2019
Source ID
10.1073/pnas.1900654116

Entities

People

  • Bin Yu
  • Chandan Singh
  • Karl Kumbier
  • Reza Abbasi-asl
  • W. James Murdoch

Organizations

  • Allen Institute for Brain Science
  • Army Research Office
  • National Science Foundation
  • Natural Sciences and Engineering Research Council
  • Office of Naval Research
  • University of California

Tags

Fields of Study

  • Computer science

Readers

  • Artificial Intelligence
  • Systems Analysis and Design
  • Theoretical Analysis.

Technology Areas

  • AI & ML
  • AI & ML - Neural Networks