Towards Mathematical Foundations for Foundation Models

Abstract

Approved for Public Release.Foundation models are large neural networks, typically trained in an unsupervised fashion on large amounts of data. They have fundamentally transformed the landscape of machine learning in many domains, including language, images, and scientific modeling. The main driving force behind their impressively improved performance in the last several years has arguably been scale#both of model size and data#rather than algorithmic innovation. This has left our theoretical understanding of both why current models work, and how to improve them severely impoverished.In this proposal, we introduce mathematical frameworks to understandand improve several critical aspects in the foundation model pipeline. The overall research philosophy in the proposal will be to combine mathematical abstractions that can be theoretically analyzed with thorough and careful experimentation. The insights will be finally used as guidance towards identifying algorithmic improvements at scale. We address core statistical, algorithmic, and architectural tradeoffs in the pre-training phase (that is, fitting the model from data), as well as algorithmic machinery to build improved inference procedures (that is, using the trained model, typically via sampling from it). In the former category, we will address:(1) the statistical efficiency of score-based losses#the de-facto way totrain diffusion and energy-based generative models; (2) a formalism to understand out-of-distributiongeneralization through the lens of graphical models and causality, taking into account current practices for training foundation models (namely, data mixes and active collection of data); (3) theoretically guidedarchitecture design for neural models as applied to partial differential equation solvers and unpaired domain translation.In the latter category, we will address: (1) possibilities and limits of using pre-trained diffusion models as"oracles" for sampling from distributions;(2) the benefits and limits of knowledge distillation strategiesthrough a "trajectory" analysis.Positive outcomes for the research objectives are likely to have a practical impact on several technologies of interest to the Office of Naval Research, including "translating" between imagery from different surveillance technologies, more robust behavior of machine learning technologies trained ondata collected from sensors deployed in diverse environments, and more efficient methods for simulating fluid dynamics models.

Document Details

Document Type
DoD Grant Award
Publication Date
Jan 13, 2025
Source ID
N000142512124

Entities

People

  • Andrej Risteski

Organizations

  • Carnegie Mellon University
  • Office of Naval Research
  • United States Navy

Tags

Fields of Study

  • Computer science

Readers

  • Distributed Systems and Data Platform Development
  • Neural Network Machine Learning.

Technology Areas

  • AI & ML
  • AI & ML - DoD AI Strategy
  • AI & ML - Neural Networks