Fundamentals of Large Scale Reinforcement Learning
Abstract
Learning and planning in uncertain environments is a central problem in AI and machine learning. Using reinforcement learning and data-driven approaches, recent years have seen major advances in the control of uncertain dynamical systems, with examples ranging from allowing robots to perform more sophisticated controls tasks (such as robotic hand manipulation) to sequential decision making in game domains (e.g., AlphaGo and Atari game playing).Many of these successes have relied on sampling based reinforcement learning algorithms, including the DeepRL and Monte Carlo tree search approaches, where we have little theoretical understanding of their e ciency, either from a statistical or a computational perspective. In contrast, both MDP theory and optimal control theory have a rich body of tools, with provable guarantees, for related sequential decision making problems, particularly those thatinvolve continuous control. The objective of this proposal is to build the fundamental tools for learning and planning in large domains, with an eye towards the brining rigor to the current body of heuristics and developing new provably e cient methods. The questions we seek to address are:* Can we provide a rigorous analysis of widely used local search based methods and gradient descent methods as applied to continuous control problems? Despite the practical success of these local search based methods, even the simplest convergence properties of these methods remain poorly understood due to the nonconvexity. A basic (and unresolved question) is: suppose an agent is in an unknown Markov Decision Process, how should the agent optimally balance the tradeo -s in exploration (learning about the evironment) and exploitation (garnering reward based on the agent s current knowledge)? While this question has been widely studied, the optimal procedure is still not sharply characterized; we desire a computationally e cientmethod, which achieve the best possible information theoretic rate. *Can we provide a rigorous analysis of the Monte Carlo tree search methods and rollout based approaches? These methods are fundamental in the recent Alpha Go successes. The question is related to understanding assumptions under which rollout based methods can provide approximately optimal solutions and why some heuristics for pruning the search tree can be unreasonably e -ective for planning.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Jul 10, 2018
- Source ID
- N000141812247
Entities
People
- Sham Kakade
Organizations
- Office of Naval Research
- United States Navy
- University of Washington