Regaining Control in Reinforcement Learning

Abstract

Along with the sharp increase in visibility of the field, the rate at which new reinforcement learning algorithms are being proposed is at a new peak. While the surge in activity is creating more excitement, there seems to be a gap in understanding of fundamental principles that these algorithms need to satisfy for any meaningful applications. The goal of this project is to address these gaps via two orthogonal approaches. (1) The vast majority of reinforcement learning algorithms belong to a more fundamental class of learning algorithms known as Stochastic Approximation. One half of the proposed research seeks to build a firm foundation for reinforcement learning algorithm design based on recent and ancient results from this field. In particular, it was established in [Borkar & Meyn, 2000] that both stability and convergence of these algorithms is guaranteed by analyzing the stability of two associated ODEs. Moreover, if the linearized ODE passes a simple eigenvalue test, then an optimal rate of convergence is guaranteed since the Central Limit Theorem holds in this case. Using these foundational results, the goal is to develop reinforcement learning algorithms that not only have strong convergence guarantees, but also optimal rates of convergence. (2) A complementary approach to address efficiency in learning is a Bayesian viewpoint, representing all uncertainty of the environment through a belief distribution. This offers a coherent framework for integrating prior knowledge as well as richer observations including forecasts and system responses that go beyond future rewards, and lead to efficient exploration. An obstacle in this area has been in developing Bayesian representations that are scalable in complex environments in a computationally tractable manner. Building on the recent work of [Osband et al., 2017], the second objective is to leverage effective value-function learning algorithms to develop scalable exploration techniques that can lead to efficient learning.

Document Details

Document Type: DoD Grant Award
Publication Date: Jul 09, 2020
Source ID: W911NF2010055

Entities

People

Benjamin Van Roy

Organizations

Army Contracting Command
Stanford University
United States Army

Regaining Control in Reinforcement Learning

Abstract

Document Details

Entities

People

Organizations

Tags

Fields of Study

Readers

Technology Areas