Regaining Control in Reinforcement Learning
Abstract
Along with the sharp increase in visibility of the field, the rate at which new reinforcement learning algorithms are being proposed is at a new peak. While the surge in activity is creating more excitement, there seems to be a gap in understanding of fundamental principles that these algorithms need to satisfy for any meaningful applications. The goal of this project is to address these gaps via two orthogonal approaches. (1) The vast majority of reinforcement learning algorithms belong to a more fundamental class of learning algorithms known as Stochastic Approximation. One half of the proposed research seeks to build a firm foundation for reinforcement learning algorithm design based on recent and ancient results from this field. In particular, it was established in [Borkar & Meyn, 2000] that both stability and convergence of these algorithms is guaranteed by analyzing the stability of two associated ODEs. Moreover, if the linearized ODE passes a simple eigenvalue test, then an optimal rate of convergence is guaranteed since the Central Limit Theorem holds in this case. Using these foundational results, the goal is to develop reinforcement learning algorithms that not only have strong convergence guarantees, but also optimal rates of convergence. (2) A complementary approach to address efficiency in learning is a Bayesian viewpoint, representing all uncertainty of the environment through a belief distribution. This offers a coherent framework for integrating prior knowledge as well as richer observations including forecasts and system responses that go beyond future rewards, and lead to efficient exploration. An obstacle in this area has been in developing Bayesian representations that are scalable in complex environments in a computationally tractable manner. Building on the recent work of [Osband et al., 2017], the second objective is to leverage effective value-function learning algorithms to develop scalable exploration techniques that can lead to efficient learning.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Jul 09, 2020
- Source ID
- W911NF2010055
Entities
People
- Benjamin Van Roy
Organizations
- Army Contracting Command
- Stanford University
- United States Army