Modern Statistical Foundation for Resource-Constrained Reinforcement Learning
Abstract
Reinforcement learning (RL), which is frequently modeled as sequential decision making and reasoning in a Markov decision process, h,as received an explosion of interest in defense-related applications (e.g., robotics, surveillance, autonomous systems) due to its r,emarkable recent progress in practice. A core objective of RL is to learn how to maximize long-term rewards without having precise d,escriptions of the complex environments. In contemporary RL and control applications, however, it is increasingly more common to enc,ounter environments with prohibitively large state/action space and long horizons. The number of data samples available at hand, how,ever, is often severely limited compared to the unprecedented model complexity, particularly in those sample-starved and dynamically, changing environments. To further complicate matters, natural RL formulations are intrinsically nonconvex, thus leading to severe c,omputational concerns. All this presents a major bottleneck in transferringpromising RL paradigms from research to practice, particu,larly in those resource-constrained defense applications that require trustworthy reasoning and time-critical decision making. In li,ght of these challenges, it is pivotal to re-examine the fundamental trade-off between three competing goals --- sample efficiency,,statistical accuracy and computational efficiency --- in large-scale RL scenarios, and to design algorithms that can effectively cop,e with practical resource constraints (e.g., constraints in sensing and computational resources).Motivated by the aforementioned cha,llenges, the overall objective of this research program is to enable and advance resource-efficient RL. We aim to investigate how to, tackle sample-starved environments via modern statistical tools, and how to exploit the unreasonable effectiveness of nonconvex opt,imization when solving RL problems. The proposed research program consists of the following major thrusts, which seek to systematica,lly assess the strengths and shortcomings of several distinctive yet tightly intertwined RL approaches. Thrust 1: breaking the sampl,e size barrier in RL.Thrust 2: algorithm-dependent lower bounds and efficient remedies for policy-based RL.Thrust 3: demystifying an,d enhancing sample efficiency of value-based RL.Thrust 4: achieving efficiency in constrained and regularized RLThrust 5: trustworth,y uncertainty quantification for RL.This proposed research program is expected to deliver important practical insights for a diverse, array of Navy applications that require reliable reasoning and learning in complex, sample-starved, and dynamically changing enviro,nments. The proposed tasks are of critical importance in improving trustworthy real-time decision making, enabling resource-efficien,t learning and planning, and enhancing automatic situation awareness.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- May 16, 2022
- Source ID
- N000142212354
Entities
People
- Yuxin Chen
Organizations
- Office of Naval Research
- United States Navy
- University of Pennsylvania