AFOSR YIP ROBUST MAXIMUM ENTROPY PLANNING, LEARNING AND CONTROL IN UNCERTAIN ENVIRONMENTS
Abstract
This work will develop flexible, robust, and efficient methods for sequential decision making in scenarios where there is significant uncertainty in the environment and reward signal. This work is motivated by the hypothesis that learning accurate models of complex environments is prohibitive, and that learning must be robust even in the setting of low-fidelity models. The approach builds on maximum entropy reinforcement learning (MaxEnt RL), which encourages high reward while maintaining policy uncertainty via entropy. The first research effort focuses on developing robust and sample-efficient model-based learning methods that extend the MaxEnt RL approach. The proposed methods simultaneously learn model representations and policy, which encouraging high policy uncertainty. Additional robustness is obtained by developing a diversity-preserving sample mechanism to identify distinct high-quality trajectories in high-dimensional continuous state-action spaces. The second research effort addresses random and unknown reward signals by specifying a generative model including a prior belief over random reward functions. Efficient variational techniques are developed to marginalize unknown rewards. Finally, the proposed work will build on this random rewards model to learn reward from expert demonstrations, when they are available, performing so-called inverse reinforcement learning (IRL). The proposed approach is referred to as MaxEnt IRL, since it extends the maximum entropy RL framework that is developed throughout this project.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Mar 07, 2023
- Source ID
- FA95502210194
Entities
People
- Jason Pacheco
Organizations
- Air Force Office of Scientific Research
- United States Air Force
- University of Arizona