Enabling Scalable Hessian Based Methods for Deep Neural Networks Training and Analysis
Abstract
Abstract: Deep Neural Networks (DNNs) have influenced a wide range of artificial i ntelligence( AI) m achine l earning ( ML) a pplications, e specially industrial applications related to computervision and natural language processing, but their impact in Scientific ML (SciML) has to date beenmore modest. A large part of the reason for this is that industries have enormous computationalbudgets and invest enormous resources in computational infrastructure, aimed at developingmodels and algorithms that are well-suited to their industrial goals. Such methods aretypically heavily-parameterized variants of first-order s tochastic g radient descent (SGD)methods. While appropriate in certain industrial applications, such SGD-based methods are notwell-suited to many SciML problems, where second-order methods and other Hessian-basedmethods are much more widely used. Even for industrial AI/ML, SGD-based methods have manywell-known problems, e.g., needing extremely expensive hyperparameter sweeps and then resultingin non-robust models. Recent work has provided proof-of-principle demonstrations that many ofthese problems are solvable by second-order and other Hessian-based methods. These second-ordermethods are based on developments in Randomized Numerical Linear Algebra (RandNLA), andthey are particularly well-suited for modern computational environments.To go beyond proof-of-principle demonstrations that Hessian-based methods will lead to transformativeadvances will require detailed and computationally-expensive evaluations, and this willrequire qualitatively-more state-of-the-art computational resources than are available in typicalnon-industrial research environments. Given those resources, we will be able to provide a detailedevaluation of the trade-offs between heavily-parameterized SGD-based methods and Hessian-basedmethods. This will enable us to demonstrate the applicability of these novel RandNLA-basedmethods for: generic model development in industrial AI/ML applications (training, testing, etc.);model development in SciML applications (where domain constraints must often be respectedwith a much greater level of fidelity); and for model validation/evaluation in industrial and SciMLapplications (to characterize stability, robustness, usefulness in embedded pipelines, etc.). We expectthese, and in particular the model validation/evaluation methods, to be transformative, as this canoften be the bottleneck to using DNNs in mission-critical applications. In addition to expanding theusefulness of these optimization algorithms in a much broader range of applications, this expansionwill feed back and encourage the development of principled new algorithmic methods.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- May 05, 2021
- Source ID
- N000142112381
Entities
People
- Michael Mahoney
Organizations
- Office of Naval Research
- United States Navy
- University of California Regents