Foundations of Scalable Nonconvex Optimization

Abstract

This research program focused on creating a new paradigm for scalable nonconvex optimization. Four principal investigators with backgrounds in optimization, machine learning, and statistics were involved. The project led to development of new theories for understanding the acceleration phenomena in convex and nonconvex optimization in Euclidean and non-Euclidean spaces, with applications of training deep neural networks. The project also led to new theories about limitations and expressivity of neural networks, and to the first complete results for characterization of the optimization landscape of deep linear neural networks, leading to new results that supported the concept that local minima are global. Adding minimal nonlinearities changed the picture, with local optima whose performance are worse than linear classifiers. A first set of complete results on Bayesian optimization was developed, corresponding to settings in which even function evaluations are expensive, as well as gradients and higher order derivatives. Furthermore, the powers of graph neural networks, which have become a popular new framework for modeling large scale data with graph structures, was rigorously characterized. Finally, over fitting and generalization was analyzed, showing that the standard view in machine learning/statistics that interpolation leads to overfitting is not quite accurate in high dimensions, providing an explanation for unreasonable effectiveness of over-parameterized neural networks.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Oct 01, 2019
Accession Number
AD1090512

Entities

People

  • Alexander Rakhlin
  • Ali Jadbabaie
  • Stefanie Jegelka
  • Suvrit Sra

Organizations

  • Massachusetts Institute of Technology

Tags

Communities of Interest

  • Autonomy
  • Energy and Power Technologies
  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Air Force
  • Air Force Research Laboratories
  • Artificial Intelligence
  • Bayesian Networks
  • Computational Science
  • Differential Equations
  • Geometry
  • Information Processing
  • Information Science
  • Information Systems
  • Kernel Functions
  • Machine Learning
  • Military Research
  • Neural Networks
  • Numerical Analysis
  • Standards
  • Statistics

Fields of Study

  • Computer science

Readers

  • Distributed Systems and Data Platform Development
  • Finite Element Method (FEM) for solving Partial Differential Equations (PDEs)
  • Regression Analysis.

Technology Areas

  • AI & ML
  • AI & ML - Machine Learning Algorithms
  • AI & ML - Neural Networks
  • Space