Reinforcement Learning of Informed Initial Policies for Decentralized Planning

Abstract

Decentralized partially observable Markov decision processes (Dec-POMDPs) offer a formal model for planning in cooperative multiagent systems where agents operate with noisy sensors and actuators, as well as local information. Prevalent solution techniques are centralized and model based—limitations that we address by distributed reinforcement learning (RL). We particularly favor alternate learning, where agents alternately learn best responses to each other, which appears to outperform concurrent RL. However, alternate learning requires an initial policy. We propose two principled approaches to generating informed initial policies: a naive approach that lays the foundation for a more sophisticated approach. We empirically demonstrate that the refined approach produces near-optimal solutions in many challenging benchmark settings, staking a claim to being an efficient (and realistic) approximate solver in its own right. Furthermore, alternate best response learning seeded with such policies quickly learns high-quality policies as well.

Document Details

Document Type: Pub Defense Publication
Publication Date: Dec 08, 2014
Source ID: 10.1145/2668130

Entities

People

Bikramjit Banerjee
Landon Kraemer

Organizations

United States Army Research Laboratory
University of Southern Mississippi

Reinforcement Learning of Informed Initial Policies for Decentralized Planning

Abstract

Document Details

Entities

People

Organizations

Tags

Fields of Study

Readers

Technology Areas