Beyond Worst-Case Analysis in Reinforcement Learning

Abstract

Many of machine learnings past successes and also its aspirations for the future revolve around reinforcement learning. At its core,, the model is simple: An agent interacts with its environment through a sequence of actions and would like to maximize the cumulati,ve reward. However there are wide gaps between the abstract models where we can give strong provable guarantees, and the settings wh,ere reinforcement learning is deployed in practice.The goal of our research is to expand the frontier in reinforcement learning alon,g several directions: First, we explore frameworks that allow us to circumvent natural statistical and algorithmic barriers. In many, applications of reinforcement learning, we are not aiming to nd an optimal policy from scratch. Rather we have important prior kno,wledge about the instance we would like to solve. Building on recent developments in learning augmented-algorithms, we will study th,e power of advice in the context of reinforcement learning. Moreover in many scenarios the state space is huge and we need to assume, some sort of structure in it. We will explore the assumptions that have been suggested in the literature from the perspective of co,mputational vs. statistical tradeos, and develop an algorithmic theory of what is the appropriate notion of structure.Second, we wi,ll design new algorithms for some of the fundamental problems in reinforce-ment learning. Policy gradient methods are a exible clas,s of algorithms based on nding an appropriate parameterization of the space of policies, and following gradient ascent to improve t,he expected reward. We will explore the power of regularization and give gradient based algorithms that that converge in a polynomia,l number of iterations and using a poly-nomial number of samples. Finally, in many applications we only get partial information abou,t the state of the system in each round. Through natural structural assumptions that allow us to control the rate at which beliefs c,onverge, we will give ecient algorithms for planning and learning.Together, these projects would help bridge the gap between theory, and practice in rein-forcement learning by identifying new structural properties of realistic instances and other classes of models, that enable ecient algorithms. This proposal has the potential to lead to inherently new approaches for reinforcement learning, wh,ich often relies on heuristics with-out provable guarantees. Moreover it cuts across several areas of algorithms, optimization, stat,istics and control theory.Approved for Public Release.

Document Details

Document Type: DoD Grant Award
Publication Date: Apr 01, 2022
Source ID: N000142212339

Entities

People

Ankur Moitra

Organizations

Massachusetts Institute of Technology
Office of Naval Research
United States Navy

Beyond Worst-Case Analysis in Reinforcement Learning

Abstract

Document Details

Entities

People

Organizations

Tags

Fields of Study

Readers

Technology Areas