Enabling Scalable Hessian Based Methods for Deep Neural Networks Training and Analysis

Abstract

Abstract: Deep Neural Networks (DNNs) have influenced a wide range of artificial i ntelligence( AI) m achine l earning ( ML) a pplications, e specially industrial applications related to computervision and natural language processing, but their impact in Scientific ML (SciML) has to date beenmore modest. A large part of the reason for this is that industries have enormous computationalbudgets and invest enormous resources in computational infrastructure, aimed at developingmodels and algorithms that are well-suited to their industrial goals. Such methods aretypically heavily-parameterized variants of first-order s tochastic g radient descent (SGD)methods. While appropriate in certain industrial applications, such SGD-based methods are notwell-suited to many SciML problems, where second-order methods and other Hessian-basedmethods are much more widely used. Even for industrial AI/ML, SGD-based methods have manywell-known problems, e.g., needing extremely expensive hyperparameter sweeps and then resultingin non-robust models. Recent work has provided proof-of-principle demonstrations that many ofthese problems are solvable by second-order and other Hessian-based methods. These second-ordermethods are based on developments in Randomized Numerical Linear Algebra (RandNLA), andthey are particularly well-suited for modern computational environments.To go beyond proof-of-principle demonstrations that Hessian-based methods will lead to transformativeadvances will require detailed and computationally-expensive evaluations, and this willrequire qualitatively-more state-of-the-art computational resources than are available in typicalnon-industrial research environments. Given those resources, we will be able to provide a detailedevaluation of the trade-offs between heavily-parameterized SGD-based methods and Hessian-basedmethods. This will enable us to demonstrate the applicability of these novel RandNLA-basedmethods for: generic model development in industrial AI/ML applications (training, testing, etc.);model development in SciML applications (where domain constraints must often be respectedwith a much greater level of fidelity); and for model validation/evaluation in industrial and SciMLapplications (to characterize stability, robustness, usefulness in embedded pipelines, etc.). We expectthese, and in particular the model validation/evaluation methods, to be transformative, as this canoften be the bottleneck to using DNNs in mission-critical applications. In addition to expanding theusefulness of these optimization algorithms in a much broader range of applications, this expansionwill feed back and encourage the development of principled new algorithmic methods.

Document Details

Document Type
DoD Grant Award
Publication Date
May 05, 2021
Source ID
N000142112381

Entities

People

  • Michael Mahoney

Organizations

  • Office of Naval Research
  • United States Navy
  • University of California Regents

Tags

Fields of Study

  • Computer science

Readers

  • Distributed Systems and Data Platform Development
  • Neural Network Machine Learning.
  • Operations Research

Technology Areas

  • AI & ML
  • AI & ML - DoD AI Strategy
  • AI & ML - Machine Learning Algorithms
  • AI & ML - Neural Networks