Theoretical Foundations of Deep Learning
Abstract
Deep learning has radically advanced the state-of-the-art in machine learning and computer vision. Nevertheless, progress has been driven almost entirely by empirical observations, hacks, and tricks. Due to our lack of understanding, every stage of the classical Design/Build/Test (DBT) engineering methodology is broken. Today s deep learning users must (Design) choose an ad-hoc deep network architecture, (Build) train the DN using an optimizer with hyper-parameterschosen by trial and error, and (Test) produce a model that may be fragile to minor domain shifts, training set outliers, and adversarial perturbations. Without a theoretical foundation, deep learning will continue to result in poorly understood and fragile systems that are not appropriatefor a large class of critical applications, particularly for DoD use cases in which reliability is of utmost importance. This project will develop a principled theory of deep learning that is based on rigorous mathematical principles. We propose three interconnected research thrusts that address the foundational issues in the Design/Build/Test pipeline.Design Thrust: Mathematical issues in deep network design. Todays deep network design process is alchemistic and based on trial and error. We will address this issue by developing an approximation theory for deep networks based on spline functions that is compatible with the overparameterization of current deep networks.Build Thrust: Mathematical issues in deep network training. The process of training deep networks requires enormous resources for both mining datasets and optimization. Ad hoc training approaches require intractable amounts of labelled data and yield models with unpredictable behaviors. Furthermore, the role of implicit regularization in training is a major enigma that has broken our understanding of model fitting. We propose to understand the build process through new theories from partial differential equations that explain how a deepnetwork s architecture interacts with optimizers, statistical methods to understand the implicit bias of stochastic optimizers, and principled methods for learning from less data.Test Thrust: Mathematical issues in characterizing deep network performance. The life of a deep-learning agent does not end when training is over. Deep networks are deployed in complex environments with out-of-sample data, adversarial attacks, and complex highdimensional inputs. We will develop new methods to verify network performance using formal methods, quantify uncertainty using statistical methods, guarantee robustness to data corruption,and detect manipulated and adversarial inputs.Validation with 4D (space+time) action recognition. Throughout this research effort, we will motivate and validate our theoretical developments with a range of challenging applications in action recognition and video processing.Impact on DoD capabilities. Most DoD machine learning systems must train using small datasets that are label poor, and must operate in safety-critical situations where reliability is of utmost importance. This MURI project will enhance DoD capabilities by enabling better learning with less data and by producing a suite of methods to improve and certify the reliabilityof deep network models.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Aug 31, 2020
- Source ID
- N000142012787
Entities
People
- Richard G. Baraniuk
Organizations
- Office of Naval Research
- Rice University
- United States Navy