Distributed Reinforcement Learning for Policy Synchronization in Infinite-Horizon Dec-POMDPs

Abstract

In many multi-agent tasks, agents face uncertainty about the environment, the outcomes of their actions, and the behaviors of other agents. Dec-POMDPs offer a powerful modeling framework for sequential, cooperative, multiagent tasks under uncertainty. Solution techniques for infinite-horizon Dec-POMDPs have assumed prior knowledge of the model and have required centralized solvers. We propose a method for learning Dec-POMDP solutions in a distributed fashion. We identify the issue of policy synchronization that distributed learners face and propose incorporating rewards into their learned model representations to ameliorate it. Most importantly, we show that even if rewards are not visible to agents during policy execution, exploiting the information contained in reward signals during learning is still beneficial.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2012
Accession Number
ADA585093

Entities

People

  • Bikramjit Banerjee
  • Landon Kraemer

Organizations

  • University of Southern Mississippi

Tags

Communities of Interest

  • Autonomy

DTIC Thesaurus Topics

  • Abstracts
  • Algorithms
  • Artificial Intelligence
  • Computations
  • Computer Science
  • Environment
  • Information Processing
  • Information Systems
  • Learning
  • Machine Learning
  • Multiagent Systems
  • Observation
  • Operations Research
  • Probability
  • Probability Distributions
  • Reinforcement Learning
  • Transitions

Fields of Study

  • Computer science

Readers

  • Agent-Based Social Robotics and Mobile-Assisted Learning in Virtual Environments.
  • Neural Network Machine Learning.

Technology Areas

  • AI & ML
  • AI & ML - Machine Learning Algorithms