Understanding Deep Learning Architectures with Information Theory

Abstract

This proposal seeks to improve the training, hyper parameter selection and generalization ability of learning machines using a combination of machine learning and information theoretic concepts. Our ultimate goal is to improve the available quantification tools to give more confidence to designers and users alike and improve the transparency of this class of algorithms. Quantifying the generalization ability of supervised learning machines is a difficult problem that is associated with their capacity. In this proposal, we propose a different framework by blending machine learning and information theory. We have recently utilized an information theoretic learning (ITL) framework (proposed by the PI) to train a SAE using an ITL constraint (ITL-AE). On a complementary direction, Tishby proposed the information plane (IP) as a way to explain the dynamics of learning in feedforward networks. Reported applications of the IP employed approximations to estimate entropy and mutual information only valid for some data sets and small networks, which do not scale to the practical deep networks with thousands of units and hundreds of layers. We have recently developed estimators of Renyi’s entropy and mutual information using the Gram matrix, which in preliminary results shows great promise to characterize the IP dynamics directly from data. The purpose of this proposal is exactly to expand on both these very promising results (SAE ITL training and IP) to systematically design and study hyper parameters of mappers, learning dynamics and generalization in CNNs and RNNs. Deep Learning currently is tweaking hyperparameters to achieve the best possible results, without a framework. We believe that ITL can provide the needed mathematical underpinning to quantify information transfer in deep architectures and seek optimal topologies that generalize the most for unseen data belonging to the same distribution of the training set.

Document Details

Document Type: DoD Grant Award
Publication Date: Mar 18, 2025
Source ID: N001742010014

Entities

People

José Príncipe

Organizations

United States Navy
University of Florida

Understanding Deep Learning Architectures with Information Theory

Abstract

Document Details

Entities

People

Organizations

Tags

Fields of Study

Readers

Technology Areas