Foundations of Scalable Nonconvex Optimization
Abstract
This research program focused on creating a new paradigm for scalable nonconvex optimization. Four principal investigators with backgrounds in optimization, machine learning, and statistics were involved. The project led to development of new theories for understanding the acceleration phenomena in convex and nonconvex optimization in Euclidean and non-Euclidean spaces, with applications of training deep neural networks. The project also led to new theories about limitations and expressivity of neural networks, and to the first complete results for characterization of the optimization landscape of deep linear neural networks, leading to new results that supported the concept that local minima are global. Adding minimal nonlinearities changed the picture, with local optima whose performance are worse than linear classifiers. A first set of complete results on Bayesian optimization was developed, corresponding to settings in which even function evaluations are expensive, as well as gradients and higher order derivatives. Furthermore, the powers of graph neural networks, which have become a popular new framework for modeling large scale data with graph structures, was rigorously characterized. Finally, over fitting and generalization was analyzed, showing that the standard view in machine learning/statistics that interpolation leads to overfitting is not quite accurate in high dimensions, providing an explanation for unreasonable effectiveness of over-parameterized neural networks.
Document Details
- Document Type
- Technical Report
- Publication Date
- Oct 01, 2019
- Accession Number
- AD1090512
Entities
People
- Alexander Rakhlin
- Ali Jadbabaie
- Stefanie Jegelka
- Suvrit Sra
Organizations
- Massachusetts Institute of Technology