GRADIENT-BASED MULTI-MDP POLICY OPTIMIZATION FOR GENERALIZATION IN REINFORCEMENT LEARNING

Abstract

Despite the significant advance in deep reinforcement learning (RL), trained agents are prone to overfitting to train environments and fail to generalize to similar but previously unseen environment contexts at test time. While training a policy on multiple procedurally generated environments has been recently proposed to improve the generalization performance, training a policy on the multi-Markov Decision Process (MDP) setting has not been well-understood. In this project, we propose a novel gradient-based framework to learn an optimal policy by resolving the conflicting loss problem which can arise in a multi-MDP setting.

Document Details

Document Type: DoD Grant Award
Publication Date: Jan 04, 2023
Source ID: FA23862214010

Entities

People

Hyun Oh Song

Organizations

Air Force Office of Scientific Research
Seoul National University
United States Air Force

GRADIENT-BASED MULTI-MDP POLICY OPTIMIZATION FOR GENERALIZATION IN REINFORCEMENT LEARNING

Abstract

Document Details

Entities

People

Organizations

Tags

Fields of Study

Readers

Technology Areas