Enabling Scalable Hessian Based Methods for Deep Neural Networks Training and Analysis

Abstract

Abstract: Deep Neural Networks (DNNs) have influenced a wide range of artificial i ntelligence( AI) m achine l earning ( ML) a pplications, e specially industrial applications related to computervision and natural language processing, but their impact in Scientific ML (SciML) has to date beenmore modest. A large part of the reason for this is that industries have enormous computationalbudgets and invest enormous resources in computational infrastructure, aimed at developingmodels and algorithms that are well-suited to their industrial goals. Such methods aretypically heavily-parameterized variants of first-order s tochastic g radient descent (SGD)methods. While appropriate in certain industrial applications, such SGD-based methods are notwell-suited to many SciML problems, where second-order methods and other Hessian-basedmethods are much more widely used. Even for industrial AI/ML, SGD-based methods have manywell-known problems, e.g., needing extremely expensive hyperparameter sweeps and then resultingin non-robust models. Recent work has provided proof-of-principle demonstrations that many ofthese problems are solvable by second-order and other Hessian-based methods. These second-ordermethods are based on developments in Randomized Numerical Linear Algebra (RandNLA), andthey are particularly well-suited for modern computational environments.To go beyond proof-of-principle demonstrations that Hessian-based methods will lead to transformativeadvances will require detailed and computationally-expensive evaluations, and this willrequire qualitatively-more state-of-the-art computational resources than are available in typicalnon-industrial research environments. Given those resources, we will be able to provide a detailedevaluation of the trade-offs between heavily-parameterized SGD-based methods and Hessian-basedmethods. This will enable us to demonstrate the applicability of these novel RandNLA-basedmethods for: generic model development in industrial AI/ML applications (training, testing, etc.);model development in SciML applications (where domain constraints must often be respectedwith a much greater level of fidelity); and for model validation/evaluation in industrial and SciMLapplications (to characterize stability, robustness, usefulness in embedded pipelines, etc.). We expectthese, and in particular the model validation/evaluation methods, to be transformative, as this canoften be the bottleneck to using DNNs in mission-critical applications. In addition to expanding theusefulness of these optimization algorithms in a much broader range of applications, this expansionwill feed back and encourage the development of principled new algorithmic methods.

Document Details

Document Type: DoD Grant Award
Publication Date: May 05, 2021
Source ID: N000142112381

Entities

People

Michael Mahoney

Organizations

Office of Naval Research
United States Navy
University of California Regents

Enabling Scalable Hessian Based Methods for Deep Neural Networks Training and Analysis

Abstract

Document Details

Entities

People

Organizations

Tags

Fields of Study

Readers

Technology Areas