Hierarchical Reinforcement in Continuous State and Multi-Agent Environments
Abstract
This dissertation investigates the use of hierarchy and abstraction as a means of solving complex sequential decision making problems such as those with continuous state and/or continuous action spaces, and domains with multiple cooperative agents. This thesis develops several novel extensions to hierarchical reinforcement learning (HRL), and designs algorithms that are appropriate for such problems. It has been shown that the average reward optimality criterion is more natural than the more commonly used discounted criterion for continuing tasks. This thesis investigates two formulations of HRL based on the average reward semi-Markov decision process (SMDP) model, both for discrete-time and continuous-time. These formulations correspond to two notions of optimality that have been explored in previous work on HRL: hierarchical optimality and recursive optimality. Novel discrete-time and continuous-time algorithms, termed hierarchically optimal average reward RL (HAR) and recursively optimal average reward RL (RAR) are presented, which learn to find hierarchically and recursively optimal average reward policies. Two automated guided vehicle (AGV) scheduling problems are used as experimental testbeds to empirically study the performance of the proposed algorithms.
Document Details
- Document Type
- Technical Report
- Publication Date
- Sep 01, 2005
- Accession Number
- ADA438689
Entities
People
- Mohammad Ghavamzadeh
Organizations
- University of Massachusetts Amherst