Memory, Attention, and Interaction: Learning Language Models with a Long Memory
Abstract
Project Abstract: Approved for Public Release: Learning Language Models with a Long MemorySham M. Kakade University of WashingtonThe sequential prediction problem is one of the most basic learning tasks and is encountered throughout natural language modeling, spee ch synthesis, financial forecasting, and a number of other domains that have a sequential or chronological element. The abstract pro blem has received much attention over the last half century from multiple communities including TCS, machine learning, and coding th eory. The fundamental question is: How do we consolidate and reference memories about the past in order to effectively predict (and causally effect) the future? Given the immense practical importance of this prediction problem, there has been an enormous effort to explore different algorithms for storing and referencing information about the sequence, yet many fundamental questions remain open particularly in how these questions relate to communication.Recently, there have been a number of recent practical advances in lang uage modeling that have resulted in significant improvements on a range of important downstream benchmarks tasks;t here have been la rgely due to neural models, such as Long Short-Term Memory (LSTM) networks or transformer models. While these techniques seem to imp rove on standard metrics and even produce remarkably coherent text, we still have severe gaps in our ability to: generate coherent t ext (at length); use language models for more sophisticated understanding (e.g. to provide instructions); have our models recall rel evant information from the deep past.The focus on this proposal is on the mathematical foundations of learning language models (and sequence models) which capture long term dependencies. Despite the long history of sequential prediction and its importance in appli cations, many fundamental questions remain:How can models effectively learn to utilize information from the distant past to make pr edic- tions?Whatdata distributions and structural properties sof models permit computationally and statistically efficient predicti on algorithms?Currently, nearly all successful results using language models are trained in an entirely un- supervised manner on ex tremely large (web-scale) corpora. This is perhaps surprising in that one purpose for language is as a means for communication (betw een humans, or more ambitiously, between autonomous agents or between autonomous agents and humans). A fundamental question here is to what extent interactive learning (as opposed to unsupervised learning) permit more effective learning.These fundamental mathemati cal questions raised here have direct relevance to the DoD because of their implications for how automated agents can learn to inter act with either other automated agents or with humans. The questions we seek to formalize and address are with regards to fun- damen tal computational and statistical limits of long term sequence modeling relevant to building improved models of communication and in teraction; we also seek to establish the limits of interactive learning models in language learning.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Sep 07, 2021
- Source ID
- N000142112822
Entities
People
- Sham Kakade
Organizations
- Office of Naval Research
- United States Navy
- University of Washington