Beyond Right and Wrong: Validity, Confidence, and Tradeoffs in the Modern Machine Learning Lifecycle

Abstract

To first order, machine learning is wildly successful: growing applications in medicine, better- than-human accuracy in image recogn,ition, uncannily realistic text generation, improved protein structure identification. Yet digging a bit shows that this may be a su,rface veneer: machine-learned medical systems exhibit degraded performance as soon as one switches hospitals; state-of-the-art image, recognition systems change predictions wildly with imperceptible modifications of their inputs and perform poorly on new evaluation, datasets; machine-learned text models sometimes simply regurgitate text. To address these challenges requires refocusing on what th,e field targets with statistical machine learning.The traditional view has been to maximize accuracy, with other concerns secondary., This proposal suggests that while this view has yielded extraordinary progress, it is too narrow. To that end, I propose an agenda,that allows appropriate and accurate confidence in machine-learned systems,with valid quantification of uncertainty even in the face, of changing populations, large- scale models, and new and different datasets. This will require a deeper understanding of the trade,offs between classical goals?accuracy?and other desiderata, including privacy, robustness, or computation. In this context, this age,nda takes three tacks: (i) to develop families of more robust learning algorithms, which provide accuracy and confidence guarantees,in their performance even as populations change over time; (ii) to reconceptualize the machine-learning pipeline to a deeper focus o,n the data (rather than algorithms) itself; and (iii) to bring these ideas into contact with privacy, where a lack of confidence in,a system?s outputs limits the broader applicability of privacy-preserving methods.To motivate the work, consider Donoho?s ?50 Years,of Data Science?, which argues that the ?secret sauce? of machine learning is the centrality of challenge datasets, where researcher,s compete to improve metrics on different benchmark tasks, such as speech recognition and com- puter vision. Yet there are issues wi,th this agenda, as machine-learned models fail to generalize outside of the precise settings in which they were trained. As one arch,etypical example, researchers have attempted to replicate the validation (test) set for the ImageNet benchmark, an image recognition, and classification task, using an identical protocol to develop a new ImageNetV2 dataset. Top performing algorithms exhibit 50?100%, error rate increases on the new test set, and similar failures arise in other areas, from medicine to autonomous vehicles. This pro,posal suggests moving beyond the first-order successes of machine learning, where accuracy is supreme, to what I term second-order s,tatistical machine learning: a new focus on valid measures of uncertainty, confidence in the outputs of machine-learned algorithms a,nd their predictions even in the face of changing environments, and elucidation of tradeoffs with real-world constraints on estimato,rs, such as privacy or robustness. By bringing inferential (uncertainty-quantifying) tools from statistics more carefully to bear on, modern machine learning, the research this proposal out- lines will broaden the applicability and rigorous trustability of statisti,cal machine-learning.

Document Details

Document Type
DoD Grant Award
Publication Date
Aug 05, 2022
Source ID
N000142212669

Entities

People

  • John Duchi

Organizations

  • Office of Naval Research
  • Stanford University
  • United States Navy

Tags

Fields of Study

  • Computer science

Readers

  • Neural Network Machine Learning.
  • Systems Analysis and Design

Technology Areas

  • AI & ML
  • AI & ML - Neural Networks
  • Autonomy
  • Autonomy - Human-Robot Interaction