Understanding Transformers, State Space Models and Diffusion Models for Dynamical Systems
Abstract
Compared to dense networks and classical kernel machine methods, deep convolutional neural networks (CNNs), transformers and more recent variants such as state-space models (SSMs) and diffusion models (DMs), have achieved superior performance across various application domains, including highly challenging problems such as protein folding (i.e., AlphaFold3) as well as time series forecasting of dynamical systems. More recently, transformers have also been used as neural operators to forecast the future state of fluid flows and other physical and biological dynamical systems. However, it remains unclear why these architectures work so well. Are there common principles at the core of these successful neural networks. A comparative study at the fundamental level as well as at the performance level will lead to a better understanding of the underlying learning principles. This, in turn, will enable the development of better and more robust architectures, which could benefit a wide range of practical applications of interest to the DoD. We propose to perform this comparison for transformers, space-state models and diffusion models, focusing on three potentially fundamental principles (sparsity, auto-regressive learning, and multi-scale learning). Our main application area is on multi-scale dynamical systems because of the importance of this application and because the domain seems ideal for gathering unique insights in foundational models for forecasting the states of complex multi-scale systems.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Feb 06, 2025
- Source ID
- FA95502410231
Entities
People
- Mengjia Xu
Organizations
- Air Force Office of Scientific Research
- New Jersey Institute of Technology
- United States Air Force