Towards Mathematical Foundations for Foundation Models
Abstract
Approved for Public Release.Foundation models are large neural networks, typically trained in an unsupervised fashion on large amounts of data. They have fundamentally transformed the landscape of machine learning in many domains, including language, images, and scientific modeling. The main driving force behind their impressively improved performance in the last several years has arguably been scale#both of model size and data#rather than algorithmic innovation. This has left our theoretical understanding of both why current models work, and how to improve them severely impoverished.In this proposal, we introduce mathematical frameworks to understandand improve several critical aspects in the foundation model pipeline. The overall research philosophy in the proposal will be to combine mathematical abstractions that can be theoretically analyzed with thorough and careful experimentation. The insights will be finally used as guidance towards identifying algorithmic improvements at scale. We address core statistical, algorithmic, and architectural tradeoffs in the pre-training phase (that is, fitting the model from data), as well as algorithmic machinery to build improved inference procedures (that is, using the trained model, typically via sampling from it). In the former category, we will address:(1) the statistical efficiency of score-based losses#the de-facto way totrain diffusion and energy-based generative models; (2) a formalism to understand out-of-distributiongeneralization through the lens of graphical models and causality, taking into account current practices for training foundation models (namely, data mixes and active collection of data); (3) theoretically guidedarchitecture design for neural models as applied to partial differential equation solvers and unpaired domain translation.In the latter category, we will address: (1) possibilities and limits of using pre-trained diffusion models as"oracles" for sampling from distributions;(2) the benefits and limits of knowledge distillation strategiesthrough a "trajectory" analysis.Positive outcomes for the research objectives are likely to have a practical impact on several technologies of interest to the Office of Naval Research, including "translating" between imagery from different surveillance technologies, more robust behavior of machine learning technologies trained ondata collected from sensors deployed in diverse environments, and more efficient methods for simulating fluid dynamics models.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Jan 13, 2025
- Source ID
- N000142512124
Entities
People
- Andrej Risteski
Organizations
- Carnegie Mellon University
- Office of Naval Research
- United States Navy