Hierarchical Reinforcement in Continuous State and Multi-Agent Environments

Abstract

This dissertation investigates the use of hierarchy and abstraction as a means of solving complex sequential decision making problems such as those with continuous state and/or continuous action spaces, and domains with multiple cooperative agents. This thesis develops several novel extensions to hierarchical reinforcement learning (HRL), and designs algorithms that are appropriate for such problems. It has been shown that the average reward optimality criterion is more natural than the more commonly used discounted criterion for continuing tasks. This thesis investigates two formulations of HRL based on the average reward semi-Markov decision process (SMDP) model, both for discrete-time and continuous-time. These formulations correspond to two notions of optimality that have been explored in previous work on HRL: hierarchical optimality and recursive optimality. Novel discrete-time and continuous-time algorithms, termed hierarchically optimal average reward RL (HAR) and recursively optimal average reward RL (RAR) are presented, which learn to find hierarchically and recursively optimal average reward policies. Two automated guided vehicle (AGV) scheduling problems are used as experimental testbeds to empirically study the performance of the proposed algorithms.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Sep 01, 2005
Accession Number: ADA438689

Entities

People

Mohammad Ghavamzadeh

Organizations

University of Massachusetts Amherst

Hierarchical Reinforcement in Continuous State and Multi-Agent Environments

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Readers

Technology Areas