On Gradient-Based Optimization and Statistical Inference
Abstract
Abstract:Modern statistical machine learning often involves very large data sets and very large parameter spaces. In suchsettings computational efficiency is of paramount importance. Optimization theory has provided both practical andtheoretical support for computationally-efficient data analysis anddecision-making. It has supplied algorithms, as well as analysis tools, that allow rates of convergence to be determinedas explicit functions of problem parameters. The dictum of efficiency has led to a focus on algorithms that are basedprincipally on gradients of objective functions, or onestimates of gradients. Both the theory and practice of gradient-based optimization have deepened significantly inrecent years, and there are now many large-scale applications of gradient-based machine learning in commerce,science and government.There are, however, a number of conceptual and mathematical challenges on the horizon that will block progress ifthey are not addressed. Some of them involve the exploitation of geometric, algebraic and combinatorial structure inmachine learning problems. Indeed, the most widely-used tools from optimization theory focus on Euclidean spaces,but many problems in statistics and artificial intelligence have non-Euclidean structure that should be (or must be)respected, with critical consequences for explainability, efficiency and correctness. Others involve connectionsbetween optimization and statistics. It is essential to link optimization to the stochastic processes induced by sampling,so that one can obtain Bayesian or frequentist confidence intervals. Calibrating the stochasticity to provide correctcoverage remains, however, a significant challenge. In the meantime, most deployments of machine learning computeno error bars, a state of affairs that will be increasingly intolerable as machine learning begins to be used in missioncriticalapplications.We have identified the following four core problems as being unsolved in the literature, as capturing generalmathematical challenges, and as likely to have significant impact if solved:??? How to perform efficient stochastic gradient optimization on general Riemannian manifolds???? Given recent progress in linking gradient-based optimization across continuous time and discrete time, can we extendthese results to the stochastic setting, in particular developing analogs of variance control and acceleration forstochastic diffusions???? Sampling turns a population risk into an empirical risk that is generally rough, with many additional local minima andsaddle points beyond those in the population risk. Can we exploit recent work on gradient-based algorithms forescaping saddle points to give guarantees for empirical-risk minimization?Success on these fronts will provide a solid foundation for the design of robust, large-scale machine learning systems.As such, this work will have direct impact on DoD capabilities for tactical decision-making in the setting of large-scaledata streams, particularly in the real-time setting. Moreover, although our focus in this proposal is theoretical, insynergistic work in Berkeley???s RISE Lab, we are developing the Ray platform, an open-source distributed softwarestack for optimization-based and sampling-based machine learning, where our algorithmic developments will bedeployed.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Jul 27, 2018
- Source ID
- N000141812764
Entities
People
- Michael I. Jordan
Organizations
- Office of Naval Research
- United States Navy
- University of California Regents