An Information Geometric Understanding of Deep Learning

Abstract

Approved for Public ReleaseProblem. Deep networks are massively over-parametrized models, trained with rudimentary algorithms on non,-convex landscapes in millions of dimensions. They have defied attempts to lay a concrete theoretical foundation to explain their im,pressive performance. This project develops theoretical tools to study why can we train deep networks easily, and why they can gener,alize well despite their dimensionality. Deep networks are also quite unlike other machine learning models in that a network trained, on one task can be adapted, with relative ease, to predict well on other related tasks, even if the training process does not expli,ral representations.Objectives. We seek to use tools from information geometry and statistical physics to understand the interplay b,etween the weight space where the model is trained and the function space where predictions are made. Our goal is to develop a geome,tric characterization of the manifold of predictions made by the model for different weight configurations. In our proposed theory,,training can be understood as initialization at a point on this manifold and reaching a point that is,ation can be understood as volume of different slices of the manifold. Different tasks are different points in the embedding space o,f the manifold and questions about transfer can be understood as to whether these tasks project to nearby points on the manifold.App,roach. This project is structured in three aims. We seek to develop techniques to visualize the model manifold, study its shape, and, understand why larger networks train faster than smaller networks (Aim A). To understand why deep networks can generalize well, we,seek to study where typical tasks lie on the manifold and develop techniques that exploit the structure of the manifold to estimate,the generalization error (Aim B). The flexibility of neural representations suggests that similar tasks may project to nearby locati,ons on the manifold. We seek to understand relationships between tasks using this observation and formalize and study a "manifold of, tasks", which is conceptually the manifold of labelings for different "weights" in Nature s probabilistic model.Anticipated Outcome,s. We will develop principles that govern the training and generalization of deep networks and how deep networks can learn multiple,tasks together. This theory will provide testable hypotheses that will be assessed using modern deep networks for a variety of probl,ems from supervised learning, to self-supervised and meta-learning. These experiments will consolidate our understanding of deep lea,rning, help build robust learning systems, and inform new algorithms for learning from multiple tasks.Future Naval Relevance. This r,esearch will inform the mission of the Navy by improving our understanding of autonomous decision-making in unstructured environment,s which change over time. We seek to explain how the anomalous optimization and generalization properties of deep networks arise fro,m the properties of the data, and typical learning tasks. This has implications for effective decision making under uncertainty in c,omplex environments. We aim to develop an understanding of the space of learning tasks; algorithms that build upon these ideas will,are expected to demonstrate strong real-world performance for transfer, multi-task and continual learning.PI. Pratik Chaudhari, Assi,stant Professor of Electrical and Systems Engineering & Computer and Information Science, University of Pennsylvania.Funds requested,. US $405,845 ($120,000/year for 1/1/2022 - 12/31/2024 and $45,845 for equipment)

Document Details

Document Type
DoD Grant Award
Publication Date
Apr 01, 2022
Source ID
N000142212255

Entities

People

  • Pratik Chaudhari

Organizations

  • Office of Naval Research
  • United States Navy
  • University of Pennsylvania

Tags

Fields of Study

  • Computer science

Readers

  • Graph Algorithms and Convex Optimization.
  • Neural Network Machine Learning.

Technology Areas

  • AI & ML
  • AI & ML - Machine Learning Algorithms
  • AI & ML - Neural Networks
  • Space