Fundamentals of Large Scale Reinforcement Learning

Abstract

Learning and planning in uncertain environments is a central problem in AI and machine learning. Using reinforcement learning and data-driven approaches, recent years have seen major advances in the control of uncertain dynamical systems, with examples ranging from allowing robots to perform more sophisticated controls tasks (such as robotic hand manipulation) to sequential decision making in game domains (e.g., AlphaGo and Atari game playing).Many of these successes have relied on sampling based reinforcement learning algorithms, including the DeepRL and Monte Carlo tree search approaches, where we have little theoretical understanding of their e ciency, either from a statistical or a computational perspective. In contrast, both MDP theory and optimal control theory have a rich body of tools, with provable guarantees, for related sequential decision making problems, particularly those thatinvolve continuous control. The objective of this proposal is to build the fundamental tools for learning and planning in large domains, with an eye towards the brining rigor to the current body of heuristics and developing new provably e cient methods. The questions we seek to address are:* Can we provide a rigorous analysis of widely used local search based methods and gradient descent methods as applied to continuous control problems? Despite the practical success of these local search based methods, even the simplest convergence properties of these methods remain poorly understood due to the nonconvexity. A basic (and unresolved question) is: suppose an agent is in an unknown Markov Decision Process, how should the agent optimally balance the tradeo -s in exploration (learning about the evironment) and exploitation (garnering reward based on the agent s current knowledge)? While this question has been widely studied, the optimal procedure is still not sharply characterized; we desire a computationally e cientmethod, which achieve the best possible information theoretic rate. *Can we provide a rigorous analysis of the Monte Carlo tree search methods and rollout based approaches? These methods are fundamental in the recent Alpha Go successes. The question is related to understanding assumptions under which rollout based methods can provide approximately optimal solutions and why some heuristics for pruning the search tree can be unreasonably e -ective for planning.

Document Details

Document Type
DoD Grant Award
Publication Date
Jul 10, 2018
Source ID
N000141812247

Entities

People

  • Sham Kakade

Organizations

  • Office of Naval Research
  • United States Navy
  • University of Washington

Tags

Fields of Study

  • Computer science

Readers

  • Distributed Systems and Data Platform Development
  • Neural Network Machine Learning.
  • Operations Research

Technology Areas

  • AI & ML
  • AI & ML - Machine Learning Algorithms
  • Autonomy
  • Autonomy - Autonomous System Control