Collaborative Proposal: Feasible, Model-Free Distributionally Robust Policy Learning

Abstract

Research Problem and ObjectivesStandard Reinforcement Learning (RL) assumes training matches deployment. This proposal tackles distributionally robust RL to provide reliable performance under distribution shifts at test time. Goals are to: develop sample-efficient model-free algorithms with theoretical guarantees; ensure feasibility without generative models; provide guarantees for episodic and infinite-horizon RL; handle large state spaces and partial observability; empirical evaluation of robustness.Technical ApproachesDesign model-free algorithms for episodic and infinite-horizon RL settings. Episodic RL will use threshold estimators and empirical Bernstein inequality. Infinite-horizon RL will modify Q-learning and RMAX/UCRL, incorporating constrained MDPs. Function approximation and history-based approximations will be explored.Anticipated OutcomesNew sample complexity limits, feasible algorithms, demonstrated robustness over shifts, open-source release, guidelines for tuning/tradeoffs, insights into robustness vs efficiency, extensions for generalization and partial observability. Significantly advance theory and practice of safe, reliable RL under uncertainty.Impact on DoD CapabilitiesEnable reliable decision-making/control for autonomous systems, reducing real-world data needs for sim-to-real transfer. Speed certification and acquisition. Produce adaptable autonomy that safely handles novel scenarios. Provide performanceguarantees that increase trust in autonomous systems. Overall, enable robust, reliable, practical AI that can operate under uncertainty across defense applications.Approved for public release.

Document Details

Document Type: DoD Grant Award
Publication Date: Nov 09, 2024
Source ID: N000142412655

Entities

People

Jose Blanchet

Organizations

Office of Naval Research
Stanford University
United States Navy

Collaborative Proposal: Feasible, Model-Free Distributionally Robust Policy Learning

Abstract

Document Details

Entities

People

Organizations

Tags

Fields of Study

Readers

Technology Areas