Extending Hierarchical Reinforcement Learning to Continuous-Time, Average-Reward, and Multi-Agent Models

Abstract

Hierarchical reinforcement learning (HRL) is a general framework that studies how to exploit the structure of actions and tasks to accelerate policy learning in large domains. Prior work on HRL has been limited to the discrete-time discounted reward semi-Markov decision process (SMDP) model. In this paper we generalize the setting of HRL to average-reward, continuous-time and multi-agent SMDP models. We also describe experimental results from a large-scale real-world domain, attesting to the bene ts of HRL generally, and to our extensions more speci cally. Although in principle any HRL framework could su ce, we focus in this paper on the MAXQ framework. We describe three new hierarchical reinforcement learning algorithms: continuous-time discounted reward MAXQ, discrete-time average reward MAXQ, and continuous-time average reward MAXQ. We also investigate the use of hierarchical reinforcement learning to speed up the acquisition of cooperative multiagent tasks. We extend the MAXQ framework to the multiagent case which we term cooperative MAXQ, where each agent uses the same task hierarchy. Learning is decentralized, with each agent learning three interrelated skills: how to perform subtasks, which order to do them in, and how to coordinate with other agents. Coordination skills among agents are learned by using joint actions at the highest level(s) of the hierarchy. We use two experimental testbeds to study the empirical performance of our proposed extensions. One domain is a simulated robot trash collection task. The other domain is a much larger real-world multi-agent autonomous guided vehicle (MAGV) problem. We compare the performance of our proposed algorithms with each other, as well as with the original MAXQ method and to standard Q-learning. In the MAGV domain, we show that our proposed extensions outperform widely used industrial heuristics, such as first come first serve", "highest queue first" and "nearest station first".

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jul 09, 2003
Accession Number
ADA445107

Entities

People

  • Mohammad Ghavamzadeh
  • Rajbala Makar
  • Sridhar Mahadevan

Organizations

  • University of Massachusetts Amherst

Tags

Communities of Interest

  • Autonomy

DTIC Thesaurus Topics

  • Algorithms
  • Artificial Intelligence
  • Automated Guided Vehicles
  • Autonomous Agents
  • Computer Science
  • Information Processing
  • Information Systems
  • Learning
  • Machine Learning
  • Manufacturing
  • Multiagent Systems
  • Navigation
  • Probability
  • Reinforcement Learning
  • Robots
  • Scheduling (Production)
  • Simulations

Fields of Study

  • Computer science

Readers

  • Agent-Based Social Robotics and Mobile-Assisted Learning in Virtual Environments.
  • Distributed Systems and Data Platform Development
  • Mathematical Modeling and Probability Theory.

Technology Areas

  • AI & ML
  • AI & ML - Machine Learning Algorithms
  • Autonomy
  • Autonomy - Autonomous System Control