Non-parametric methods in reinforcement learning:Instance-optimality, adaptivity and data-dependent

Abstract

Research problem and objectives: This research proposal addresses fundamental questionsconcerning the use of non-parametric methods,in reinforcement learning. For the subclass of ADP/RL problems in which the state and action spaces are discrete, taking on finitely, many values, our theoretical understanding of algorithms and fundamental limits is fairly complete. However, many RL problems invol,ve state and/or action spaces that are continuous, and apart from certain special cases (e.g., linear-quadratic control), there are,fewer theoretical guarantees in such settings. Such problems require the flexibility and richness provided by non-parametric functio,n classes, including kernel-based methods, regression trees, and neural networks, among others. The research will leverage a combina,tion of techniques from high-dimensional statistics and non-parametric andsemi-parametric statistics to develop new procedures that,are theoretically well-grounded.Technical tools: In terms of techniques, this research will exploit cutting edge techniquesfrom empi,rical process theory, concentration of measure, and high-dimensional statistics in order to obtain sharp upper and lower bounds. It,will also make use of randomized algorithms and approximation-theoretic techniques to derive computationally efficient procedures. A,nticipated outcomes: Expected outcomes of this work are fundamental theoretical guaran-tees for non-parametric methods in applicatio,n to RL problems, including the problems of fitted value iteration, fitted policy optimization and Q-learning, as well as off-policy, versions of these same problems. In addition to theoretical guarantees, this research should lead to computationally- efficient pro,cedures.Impact on DOD capabilities: This is a fundamental research project that is not expected toproduce any developmental items. S,hould any developmental items result from this work they will have both civilian and military applications. The intended research is, theoretical and will not result in any environmental impacts.

Document Details

Document Type: DoD Grant Award
Publication Date: Sep 08, 2022
Source ID: N000142212756

Entities

People

Martin J. Wainwright

Organizations

Massachusetts Institute of Technology
Office of Naval Research
United States Navy

Non-parametric methods in reinforcement learning:Instance-optimality, adaptivity and data-dependent

Abstract

Document Details

Entities

People

Organizations

Tags

Readers

Technology Areas