Towards Distributed and Online Partially Observable Stochastic Camouflage Games

Abstract

APPROVED FOR PUBLIC RELEASEThe problem of transporting expensive cargo by multiple ships/platforms to a desired destination is ubiquitous. However, in many applications, the transport agent (henceforth the blue team) may face potential attack threats from greedy hijackers (henceforth, the red team), who want to take the cargo for themselves. Applications of this sort are particularly common in security domains such as transporting large amount of cargo while avoiding pirates# attacks and transporting important materials while avoiding enemies# interception in military applications, to name a few. In this project, we propose to systematically study arguably one of the most often used measures to counteract such attack # that is, use camouflages to hide important platforms from red team#s detection and thus safely transport cargo to the destination. Towards that end, we propose and study a novel class of stochastic games, which we term Resource Allocation Partially Observable StochasticCamouflage Games (R-POSCG), played between a blue team and a red team. The blue team seeks to route many platforms to a destination whereas the red team seeks to use sensors to detect these platforms. The game features two phases, corresponding to resource assignments and online execution, respectively, and will be distributed and have partially observable game states.The project is divided into two major themes: (1) efficiently solving the game inany given known environments; (2) learning the optimal blue team strategy in an unknown game environment. In the first theme, we will address multiple new challenges in the R-POSCG game, including partial observation of camouflage status, learning compact belief representation to capture the large belief space, and optimizing the red team#s resource application by designing and solving bilevel optimization formulation of R-POSCGs. Moreover, we will study R-POSCG under both centralized and decentralized platforms, and use curriculum learningto gain robustness in adversarial learning. In the second theme, we will develop robust learning approaches to address instability of equilibrium solution in game parameter and opponent behaviors, design data-efficient learning algorithms under enormous action space by melding combinatorial optimization and reinforcement learning, use function approximation to efficiently learn the two-phase player policies of R-POSCG, and tame the non-stationarity of multi-agent reinforcement learning in R-POSCG by resorting to adversarial bandit learning framework.With our team#s significant past experience in security games over the past two decades, particularly Stackelberg Security Games and Green Security Games, as well as our experience with earlier work in decentralized POMDPs, we believe we are in a unique position to address the research questions in R-POSCGs. We expect that addressing these research questions not only can help us to design more practical algorithms for solving the proposed R-POSCG game, but also will lead to fundamentally novel techniques that could spur more optimization and reinforcement learning research for solving more realistic and complex strategic games. Ultimately we expect the resulting models and algorithms to be useful in many applications, just as our earlier research in security games led to multiple applications, both in physical and cyber realms.

Document Details

Document Type
DoD Grant Award
Publication Date
Aug 11, 2023
Source ID
N000142312802

Entities

People

  • Milind Tambe

Organizations

  • Office of Naval Research
  • President and Fellows of Harvard College
  • United States Navy

Tags

Fields of Study

  • Computer science

Readers

  • Agent-Based Social Robotics and Mobile-Assisted Learning in Virtual Environments.
  • Distributed Systems and Data Platform Development
  • Game Theory.

Technology Areas

  • AI & ML
  • AI & ML - Autonomous Systems
  • AI & ML - Machine Learning Algorithms
  • Cyber
  • Space