GRADIENT-BASED MULTI-MDP POLICY OPTIMIZATION FOR GENERALIZATION IN REINFORCEMENT LEARNING
Abstract
Despite the significant advance in deep reinforcement learning (RL), trained agents are prone to overfitting to train environments and fail to generalize to similar but previously unseen environment contexts at test time. While training a policy on multiple procedurally generated environments has been recently proposed to improve the generalization performance, training a policy on the multi-Markov Decision Process (MDP) setting has not been well-understood. In this project, we propose a novel gradient-based framework to learn an optimal policy by resolving the conflicting loss problem which can arise in a multi-MDP setting.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Jan 04, 2023
- Source ID
- FA23862214010
Entities
People
- Hyun Oh Song
Organizations
- Air Force Office of Scientific Research
- Seoul National University
- United States Air Force