Reinforcement Learning of Informed Initial Policies for Decentralized Planning

Abstract

Decentralized partially observable Markov decision processes (Dec-POMDPs) offer a formal model for planning in cooperative multiagent systems where agents operate with noisy sensors and actuators, as well as local information. Prevalent solution techniques are centralized and model based—limitations that we address by distributed reinforcement learning (RL). We particularly favor alternate learning, where agents alternately learn best responses to each other, which appears to outperform concurrent RL. However, alternate learning requires an initial policy. We propose two principled approaches to generating informed initial policies: a naive approach that lays the foundation for a more sophisticated approach. We empirically demonstrate that the refined approach produces near-optimal solutions in many challenging benchmark settings, staking a claim to being an efficient (and realistic) approximate solver in its own right. Furthermore, alternate best response learning seeded with such policies quickly learns high-quality policies as well.

Document Details

Document Type
Pub Defense Publication
Publication Date
Dec 08, 2014
Source ID
10.1145/2668130

Entities

People

  • Bikramjit Banerjee
  • Landon Kraemer

Organizations

  • United States Army Research Laboratory
  • University of Southern Mississippi

Tags

Fields of Study

  • Computer science

Readers

  • Adaptive Control and Estimation with Uncertainty in Dynamic Systems.
  • Agent-Based Social Robotics and Mobile-Assisted Learning in Virtual Environments.
  • Systems Analysis and Design

Technology Areas

  • AI & ML
  • AI & ML - Machine Learning Algorithms