Memory, Attention, and Interaction: Learning Language Models with a Long Memory

Abstract

Project AbstractProject Abstract: Approved for Public Release: Learning Language Models with a Long MemorySham M. Kakade Harvard Uni,versityThe sequential prediction problem is one of the most basic learning tasks and is encountered throughout natural language mode,ling, speech synthesis, financial forecasting, and a number of other domains that have a sequential or chronological element. The ab,stract problem has received much attention over the last half century from multiple communities including TCS, machine learning, and, coding theory. The fundamental question is: How do we consolidate and reference memories about the past in order to effectively pre,dict (and causally effect) the future? Given the immense practical importance of this prediction problem, there has been an enormous, effort to explore different algorithms for storing and referencing information about the sequence, yet many fundamental questions r,emain open particularly in how these questions relate to communication.Recently, there have been a number of recent practical advanc,es in language modeling that have resulted in significant improvements on a range of important downstream benchmarks tasks; there ha,ve been largely due to neural models, such as Long Short-Term Memory (LSTM) networks or transformer models. While these techniques s,eem to improve on standard metrics and even produce remarkably coherent text, we still have severe gaps in our ability to: generate,coherent text (at length); use language models for more sophisticated understanding (e.g. to provide instructions); have our models,recall relevant information from the deep past.The focus on this proposal is on the mathematical foundations of learning language mo,dels (and sequence models) which capture long term dependencies. Despite the long history of sequential prediction and its importanc,e in applications, many fundamental questions remain: How can models effectively learn to utilize information from the distant past, to make predictions? What data distributions and structural properties of models permit computationally and statistically efficien,t prediction algorithms? Currently, nearly all successful results using language models are trained in an entirely un-supervised ma,nner on extremely large (web-scale) corpora. This is perhaps surprising in that one purpose for language is as a means for communica,tion (between humans, or more ambitiously, between autonomous agents or between autonomous agents and humans). A fundamental questio,n here is to what extent interactive learning (as opposed to unsupervised learning) permit more effective learning.These fundamental, mathematical questions raised here have direct relevance to the DoD because of their implications for how automated agents can lear,n to interact with either other automated agents or with humans. The questions we seek to formalize and address are with regards to,fundamental computational and statistical limits of long term sequence modeling relevant to building improved models of communicatio,n and interaction; we also seek to establish the limits of interactive learning models in language learning.

Document Details

Document Type: DoD Grant Award
Publication Date: May 16, 2022
Source ID: N000142212377

Entities

People

Sham Kakade

Organizations

Office of Naval Research
President and Fellows of Harvard College
United States Navy

Memory, Attention, and Interaction: Learning Language Models with a Long Memory

Abstract

Document Details

Entities

People

Organizations

Tags

Fields of Study

Readers

Technology Areas