Optimization and Learning Foundations of Transformer Models

Abstract

Optimization and Learning Foundations of Transformer ModelsSamet OymakUniversity of Michigan, Ann ArborThe transformer is a neural network architecture tailored for processing sequential data. Since its inception, it has led to revolutionary advances in natural language processing and it underlies large language models (LLMs), such as ChatGPT and Bard. The modern LLMs have demonstrated an unparalleled ability to capture and leverage vast amounts of data, thereby facilitating near human-level performance across a variety of language understanding tasks, such as question answering, code generation, and mathematical reasoning, and their capabilities continue to expand (e.g., fusing multiple modalities, using external tools). Despite this empirical progress, we are far from having a principled understanding of the mechanisms that underlie transformers and LLMs, which is imperative for their trustworthy use as wellas efficient deployment and training. To address this challenge, this project will develop new theoretical foundations for transformers to characterize and advance their computational and statistical capabilities. The project will be carried out in three synergistic research thrusts. Thrust 1 will study the optimization landscape and inductive biases of the transformer, and shed light on the roles of its building blocks: self-attention mechanism and multi-layer perceptron. Thrust 2 will investigate the mathematical principles of autoregressive sequence generation in LLMs and develop holistic theories of learning and optimization for generative transformers. Thrust 3 aims to develop theoretically-grounded algorithms that substantially accelerate the training of transformer models. These algorithms will harness the outcomes of Thrusts 1 and 2 in order to tailor the optimization process to the architecture and data. Overall, this project will bridge the widening gap between our empirical and theoretical understanding of transformers and LLMs,develop novel theories integrating statistical learning, deep learning, and optimization, and harness these theories to drive impactful algorithmic innovations that push the frontier of efficient and trustworthy machine learning.#Approved for Public Release#

Document Details

Document Type: DoD Grant Award
Publication Date: Apr 11, 2024
Source ID: N000142412289

Entities

People

Samet Oymak

Organizations

Board of Regents of the University of Michigan
Office of Naval Research
United States Navy

Optimization and Learning Foundations of Transformer Models

Abstract

Document Details

Entities

People

Organizations

Tags

Readers

Technology Areas