Reinforcement Learning of Informed Initial Policies for Decentralized Planning
Abstract
Decentralized partially observable Markov decision processes (Dec-POMDPs) offer a formal model for planning in cooperative multiagent systems where agents operate with noisy sensors and actuators, as well as local information. Prevalent solution techniques are centralized and model based—limitations that we address by distributed reinforcement learning (RL). We particularly favor alternate learning, where agents alternately learn best responses to each other, which appears to outperform concurrent RL. However, alternate learning requires an initial policy. We propose two principled approaches to generating informed initial policies: a naive approach that lays the foundation for a more sophisticated approach. We empirically demonstrate that the refined approach produces near-optimal solutions in many challenging benchmark settings, staking a claim to being an efficient (and realistic) approximate solver in its own right. Furthermore, alternate best response learning seeded with such policies quickly learns high-quality policies as well.
Document Details
- Document Type
- Pub Defense Publication
- Publication Date
- Dec 08, 2014
- Source ID
- 10.1145/2668130
Entities
People
- Bikramjit Banerjee
- Landon Kraemer
Organizations
- United States Army Research Laboratory
- University of Southern Mississippi