Modularity, Constraints and Multimodality in Learning for Complex, Long-Horizon Sequential Decision-Making
Abstract
The objective of this work is to develop a theoretical and algorithmic foundation for efficiently learning policies for sequential decision-making that are safe during training and deployment; generalize to unforeseen environments; can easily adapt to new tasks; and are robust to adversarial perturbations. While these attributes are critical for real-world learning, current algorithms fail to provide such guarantees in a manner that scales beyond simple, short-horizon tasks. The proposed effort seeks to take a significant step towards overcoming these limitations using a range of techniques that draw upon three core organizing principles: (1) using modularity and compositionality to provide guarantees for long-horizon problems, (2) leveraging constraints at learning-time, rather than post-hoc, and (3) using multimodal, heterogenous data to triangulate concepts. These synergistic principles, listed below, collectively support the safety, generalization, adaptability, and adversarial robustness of the resulting learning algorithms.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Jan 04, 2021
- Source ID
- W911NF2110009
Entities
People
- Scott Niekum
Organizations
- Army Contracting Command
- Defense Advanced Research Projects Agency
- University of Texas at Austin