Foundations of Scalable Nonconvex Optimization

Abstract

This research program focused on creating a new paradigm for scalable nonconvex optimization. Four principal investigators with backgrounds in optimization, machine learning, and statistics were involved. The project led to development of new theories for understanding the acceleration phenomena in convex and nonconvex optimization in Euclidean and non-Euclidean spaces, with applications of training deep neural networks. The project also led to new theories about limitations and expressivity of neural networks, and to the first complete results for characterization of the optimization landscape of deep linear neural networks, leading to new results that supported the concept that local minima are global. Adding minimal nonlinearities changed the picture, with local optima whose performance are worse than linear classifiers. A first set of complete results on Bayesian optimization was developed, corresponding to settings in which even function evaluations are expensive, as well as gradients and higher order derivatives. Furthermore, the powers of graph neural networks, which have become a popular new framework for modeling large scale data with graph structures, was rigorously characterized. Finally, over fitting and generalization was analyzed, showing that the standard view in machine learning/statistics that interpolation leads to overfitting is not quite accurate in high dimensions, providing an explanation for unreasonable effectiveness of over-parameterized neural networks.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Oct 01, 2019
Accession Number: AD1090512

Entities

People

Alexander Rakhlin
Ali Jadbabaie
Stefanie Jegelka
Suvrit Sra

Organizations

Massachusetts Institute of Technology

Foundations of Scalable Nonconvex Optimization

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Fields of Study

Readers

Technology Areas