Deep Multi-Agent Planning and Reinforcement Learning with Macro-Actions

Abstract

AI systems are being included in everything from thermostats to robots such as those for the home, manufacturing and military settings. As this surge continues, AI systems will need to coordinate with other systems (e.g., sensors, robots, autonomous cars), resulting in multi-agent systems. For example, consider the problem of search and rescue with a team of aerial and ground robots. The ground robots have a limited view of the world, but are able to transport people. The aerial vehicles have a wider view of the world, but are unable to carry passengers. With limited communication, all the vehicles must coordinate, reason about each other and seek to share their information in order to rescue people in the most efficient manner. Such multi-agent solutions will drastically improve outcomes in many domains, but require new methods for coordinating the agents in uncertain, unstructured domains. These cooperative problems with uncertainty (possibly in outcomes, sensors and communication) are decentralized partially observable Markov decision processes (Dec-POMDPs). Dec-POMDPs have become a common model for multi-agent reinforcement learning and planning under uncertainty. Recent work by the PI has scaled to large domains through the use of macro-actions (i.e., temporally extended actions which may require different amounts of time). Macro-actions enable the coordination to take place at a higher level---at the level of deciding which macro-actions to execute. Macro-actions also more naturally represent real-world behavior which may require multiple time-steps (e.g., navigation to a waypoint or waiting for another agent). Unfortunately, while current macro-action planning algorithms can scale to larger horizons and state spaces, current methods assume the rest of the problem is discrete or low-dimensional and macro-actions cannot directly be used with current multi-agent reinforcement learning frameworks (such as powerful new deep reinforcement learning approaches) due to their asynchronous selection and termination. Also, the current macro-action methods assume the macro-actions are given a priori and they canÕt be changed. These are critical bottlenecks that prevent such approaches from being used in real-world systems. Therefore, we propose a new set of methods for incorporating macro-action reasoning and deep learning into multi-agent planning and reinforcement learning. When a model of the domain or a simulator is available, we will develop planning methods that scale to large sensor spaces by incorporating deep learning to determine the relevant information to plan over. When a model or simulator of the domain is not available, we will develop deep reinforcement learning methods that learn to coordinate the agents at the macro-action level. We will also develop methods for learning both the policy over macro-actions and the macro-actions themselves---removing a limitation of already possessing the correct macro-actions. Specifically, we propose to develop 1. efficient multi-agent sample-based planning methods that can scale to large continuous spaces by combining macro-actions and deep learning methods, 2. deep multi-agent reinforcement learning methods that can learn asynchronously in large problems by using high-level macro-actions, and 3. deep multi-agent reinforcement learning methods for for only learning which macro-actions to execute, but also adapt the macro-actions themselves to improve performance. The resulting methods will leverage the power of deep learning and deep reinforcement learning and the generality of macro-action-based Dec-POMDPs to allow teams of agents to coordinate in large, realistic domains while being robust to uncertainty in sensors, execution and communication as well as domain changes over time.

Document Details

Document Type: DoD Grant Award
Publication Date: Jul 09, 2020
Source ID: W911NF2010265

Entities

People

Christopher Amato

Organizations

Army Contracting Command
Northeastern University
United States Army

Deep Multi-Agent Planning and Reinforcement Learning with Macro-Actions

Abstract

Document Details

Entities

People

Organizations

Tags

Fields of Study

Readers

Technology Areas