Modularity, Constraints and Multimodality in Learning for Complex, Long-Horizon Sequential Decision-Making

Abstract

The objective of this work is to develop a theoretical and algorithmic foundation for efficiently learning policies for sequential decision-making that are safe during training and deployment; generalize to unforeseen environments; can easily adapt to new tasks; and are robust to adversarial perturbations. While these attributes are critical for real-world learning, current algorithms fail to provide such guarantees in a manner that scales beyond simple, short-horizon tasks. The proposed effort seeks to take a significant step towards overcoming these limitations using a range of techniques that draw upon three core organizing principles: (1) using modularity and compositionality to provide guarantees for long-horizon problems, (2) leveraging constraints at learning-time, rather than post-hoc, and (3) using multimodal, heterogenous data to triangulate concepts. These synergistic principles, listed below, collectively support the safety, generalization, adaptability, and adversarial robustness of the resulting learning algorithms.

Document Details

Document Type
DoD Grant Award
Publication Date
Jan 04, 2021
Source ID
W911NF2110009

Entities

People

  • Scott Niekum

Organizations

  • Army Contracting Command
  • Defense Advanced Research Projects Agency
  • University of Texas at Austin

Tags

Fields of Study

  • Computer science

Readers

  • Neural Network Machine Learning.
  • Systems Analysis and Design