GRADIENT-BASED MULTI-MDP POLICY OPTIMIZATION FOR GENERALIZATION IN REINFORCEMENT LEARNING

Abstract

Despite the significant advance in deep reinforcement learning (RL), trained agents are prone to overfitting to train environments and fail to generalize to similar but previously unseen environment contexts at test time. While training a policy on multiple procedurally generated environments has been recently proposed to improve the generalization performance, training a policy on the multi-Markov Decision Process (MDP) setting has not been well-understood. In this project, we propose a novel gradient-based framework to learn an optimal policy by resolving the conflicting loss problem which can arise in a multi-MDP setting.

Document Details

Document Type
DoD Grant Award
Publication Date
Jan 04, 2023
Source ID
FA23862214010

Entities

People

  • Hyun Oh Song

Organizations

  • Air Force Office of Scientific Research
  • Seoul National University
  • United States Air Force

Tags

Fields of Study

  • Computer science

Readers

  • Neural Network Machine Learning.
  • Systems Analysis and Design

Technology Areas

  • AI & ML
  • AI & ML - Machine Learning Algorithms
  • AI & ML - Neural Networks