Learning Control Policies for Robots with Guaranteed Out-of-Distribution Generalization

Abstract

Approved for Public ReleaseLearning Control Policies for Robots with Guaranteed Out-of-Distribution GeneralizationPI: Anirudha Majum dar (Princeton University)The goal of this project is to develop a principled framework for learning control policies for robots wit h guaranteed out-of-distribution (OOD) generalization. As an example, consider a micro aerial vehicle (MAV) trained to perform visio n-based navigation using a dataset of outdoor environments and deployed in environments with varying weather conditions, lighting, o r obstacle densities. Similarly, consider a robotic manipulator tasked with manipulating a new set of objects or an autonomous vehic le deployed in a new city. Current state-of-the-art techniq with such distribution shifts, i.e., when the distribution of environments the robot is tested on is different from the training di stribution. The significant consequences of failure for safety-critical robotic systems demands an approach that allows us to make f ormal guarantees on OOD generalization. The goal of this project is to develop precisely such an approach.Technical approach. The ke y technical insight of this project is to leverage and extend ideas from generalization theory, differential privacy, and causal inf nvironments (e.g., in order to trigger an emergency safety policy). These algorithms are developed using the PAC-Bayes generalizatio n framework and provide guaranteed confidence bounds on OOD detection while only responding to task-relevant variations in the robot s environment. Second, we propose techniques based on differential privacy to learn control policies that are insensitive to a larg e set of realistic distribution shifts (e.g., as measured by the Wasserstein distance). Third, we propose approaches based on causal inference for learning policies that generalize beyond the support of the training distribution (i.e., generalize to environments t hat have zero probability of appearing under the training distribution).Anticipated outcome. The proposed effort targets the fundame ntal science of OOD generalization for robotic systems. We anticipate that the proposed project will lead to a foundational theoreti cal framework and practical algorithms for providing guarantees on OOD generalization for robotic systems with rich sensory inputs a nd neural network-based control policies. An integral part of this effort will be to thoroughly demonstrate and validate our approac h on hardware platforms, with a particular focus on aerial inspection and manipulation using MAVs. We hypothesize that our experimen ts will demonstrate significant gains over state-of-the-art approaches in terms of (i) the ability to detect task- relevant distribu tion shifts, (ii) OOD generalization performance (as measured by sample efficiency and amountof distribution shift that can be tol erated), and (iii) the ability to provide strong theoretical guarantees on OOD generalization.Impact on DoD capabilities. The propos lex environments. The proposed framework is directly applicable to a broad range of robotic systems and application domains includin g mobile manipulators performing infrastructure inspection and repair tasks, MAVs performing reconnaissance missions in cluttered en vironments, and underwater vehicles operating in contested maritime environments. The proposed approach for learning control policie s for robots with guaranteed OOD generalization could overcome the challenges associated with state-of-the-art solutions and allow t he U.S. Navy to deploy such systems in settings that were previously beyond reach. The proposed project is well-aligned with ONR pro grams in machine learning and a

Document Details

Document Type: DoD Grant Award
Publication Date: Aug 20, 2021
Source ID: N000142112803

Entities

People

Anirudha Majumdar

Organizations

Office of Naval Research
Trustees of Princeton University
United States Navy

Learning Control Policies for Robots with Guaranteed Out-of-Distribution Generalization

Abstract

Document Details

Entities

People

Organizations

Tags

Fields of Study

Readers

Technology Areas