Optimization and Learning Foundations of Transformer Models
Abstract
Optimization and Learning Foundations of Transformer ModelsSamet OymakUniversity of Michigan, Ann ArborThe transformer is a neural network architecture tailored for processing sequential data. Since its inception, it has led to revolutionary advances in natural language processing and it underlies large language models (LLMs), such as ChatGPT and Bard. The modern LLMs have demonstrated an unparalleled ability to capture and leverage vast amounts of data, thereby facilitating near human-level performance across a variety of language understanding tasks, such as question answering, code generation, and mathematical reasoning, and their capabilities continue to expand (e.g., fusing multiple modalities, using external tools). Despite this empirical progress, we are far from having a principled understanding of the mechanisms that underlie transformers and LLMs, which is imperative for their trustworthy use as wellas efficient deployment and training. To address this challenge, this project will develop new theoretical foundations for transformers to characterize and advance their computational and statistical capabilities. The project will be carried out in three synergistic research thrusts. Thrust 1 will study the optimization landscape and inductive biases of the transformer, and shed light on the roles of its building blocks: self-attention mechanism and multi-layer perceptron. Thrust 2 will investigate the mathematical principles of autoregressive sequence generation in LLMs and develop holistic theories of learning and optimization for generative transformers. Thrust 3 aims to develop theoretically-grounded algorithms that substantially accelerate the training of transformer models. These algorithms will harness the outcomes of Thrusts 1 and 2 in order to tailor the optimization process to the architecture and data. Overall, this project will bridge the widening gap between our empirical and theoretical understanding of transformers and LLMs,develop novel theories integrating statistical learning, deep learning, and optimization, and harness these theories to drive impactful algorithmic innovations that push the frontier of efficient and trustworthy machine learning.#Approved for Public Release#
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Apr 11, 2024
- Source ID
- N000142412289
Entities
People
- Samet Oymak
Organizations
- Board of Regents of the University of Michigan
- Office of Naval Research
- United States Navy