Beyond Worst-Case Analysis in Reinforcement Learning
Abstract
Many of machine learnings past successes and also its aspirations for the future revolve around reinforcement learning. At its core,, the model is simple: An agent interacts with its environment through a sequence of actions and would like to maximize the cumulati,ve reward. However there are wide gaps between the abstract models where we can give strong provable guarantees, and the settings wh,ere reinforcement learning is deployed in practice.The goal of our research is to expand the frontier in reinforcement learning alon,g several directions: First, we explore frameworks that allow us to circumvent natural statistical and algorithmic barriers. In many, applications of reinforcement learning, we are not aiming to nd an optimal policy from scratch. Rather we have important prior kno,wledge about the instance we would like to solve. Building on recent developments in learning augmented-algorithms, we will study th,e power of advice in the context of reinforcement learning. Moreover in many scenarios the state space is huge and we need to assume, some sort of structure in it. We will explore the assumptions that have been suggested in the literature from the perspective of co,mputational vs. statistical tradeos, and develop an algorithmic theory of what is the appropriate notion of structure.Second, we wi,ll design new algorithms for some of the fundamental problems in reinforce-ment learning. Policy gradient methods are a exible clas,s of algorithms based on nding an appropriate parameterization of the space of policies, and following gradient ascent to improve t,he expected reward. We will explore the power of regularization and give gradient based algorithms that that converge in a polynomia,l number of iterations and using a poly-nomial number of samples. Finally, in many applications we only get partial information abou,t the state of the system in each round. Through natural structural assumptions that allow us to control the rate at which beliefs c,onverge, we will give ecient algorithms for planning and learning.Together, these projects would help bridge the gap between theory, and practice in rein-forcement learning by identifying new structural properties of realistic instances and other classes of models, that enable ecient algorithms. This proposal has the potential to lead to inherently new approaches for reinforcement learning, wh,ich often relies on heuristics with-out provable guarantees. Moreover it cuts across several areas of algorithms, optimization, stat,istics and control theory.Approved for Public Release.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Apr 01, 2022
- Source ID
- N000142212339
Entities
People
- Ankur Moitra
Organizations
- Massachusetts Institute of Technology
- Office of Naval Research
- United States Navy