Foundations of Deep Learning
Abstract
Neural networks and deep learning are everywhere, powering the current expansion of AI across all applications, from computer vision, to speech recognition, to self-driving cars, to search engines and natural language processing systems, to health care, and far beyond. Yet a comprehensive theory of deep learning is lacking. Basic questions about measuring the capabilities of a given neural architecture or measuring the information contained in a training set have remained open for many decades. We propose to take a significant step in addressing this fundamental technological gap by developing a precise, quantitative, theory of neural network capacity and generalization properties. We have assembled an interdisciplinary team led by one mathematician (Dr. Vershynin) and one computer scientist (Dr. Baldi) with a track-record of productive collaborations. As a team, we propose to develop a theory and conduct simulations around the fundamental concept of capacity that we have recently introduced. The capacity C(A) of a neural architecture A is defined to be the logarithm base two of the number (or volume) of different functions that can be implemented by A as its synaptic weights are varied. Most importantly, C(A) is the number of bits that must be ``communicated from the training set to the synaptic weights in order to select the proper function in A. Thus the notion of capacity acts as a hinge connecting data and neural architectures. In a series of recent papers, we have shown how capacity can be computed precisely for various neural architectures consisting of linear, or polynomial, threshold gates. In general, capacity is a cubic polynomial in the parameters of the architectures, such as the number of units per layer, where the bottleneck layers play a special role. Based on these and other preliminary results, we propose to extend the theory to other kinds of units and networks to cover most of the cases used in practice. We propose also to relate capacity to other notions of complexity and to use capacity to derive estimates and bounds on training and generalization errors. We also propose to use capacity to estimate the amount of effective information contained in training data relative to a task T by identifying and relating the size of the smallest training set that enables an architecture to perform T to the capacity of the smallest architecture that can perform T. These ideas will be investigated theoretically, by continuing to develop the mathematical theory of capacity, and also through systematic simulations conducted on synthetic data sets as well as standard benchmark datasets. In addition, the collaboration will be developed within a rich environment of real-life applications to problems in the natural sciences. Thus theoretical concepts and practical applications will be developed together and reciprocally feed on each other. Because deep learning is so widely used and is at the center of AI and its many applications, we believe that a new quantitatively theory of deep learning based on the notion of capacity could have important benefits, including providing additional guidelines for the design of neural architectures for a given task T, new ways of measuring the amount of effective information contained in the data relative to the task T, and predictive estimates of training and generalization error. As it has been said many times: ``Nothing is more practical than a good theory . It could provide explanations for the unreasonable success of deep learning, guide future design and applications, and stimulate new ideas and progress in AI, machine learning, and computer science.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Jul 09, 2020
- Source ID
- W911NF2010186
Entities
People
- Pierre Baldi
Organizations
- Army Contracting Command
- United States Army
- University of California, Irvine