Advanced Autonomy for Unmanned Surface Vehicles via Distributional Reinforcement Learning
Abstract
This proposal describes a five-year project that will investigate the potential for distributional reinforcement learning to support advanced autonomy capabilities for unmanned surface vehicles (USVs). Reinforcement learning (RL) has enjoyed many successes in itsapplication thus far to robots and autonomous systems. However, as a tool to support reliable, long-duration autonomy in the real world, the standard process of decision-making based only on expectation offers a limited perspective on the potential outcomes of the heavy-tailed and multi-modal probability distributions that may govern a USV s actions in real-world stochastic environments. Distributional reinforcement learning offers a potential way forward. Compared to traditional RL methods, Distributional RL is shown to provide more stable learning behavior in environments with high uncertainty as it learns return distributions. In addition, risk measures can be applied to adjust the level of sensitivity to aleatoric uncertainty in learned distributions, and enhance the safety ofa Distributional RL agent. Our research performed to date suggests that Distributional RL, and specifically, our adaptation of Implicit Quantile Networks (IQN) to USV navigation, can permit USVs to safely and efficiently navigate densely packed obstacles and flowdisturbances, under minimalistic sensing. Our preliminary results from congested multi-vehicle scenarios also suggest that there isstrong potential for Distributional RL to support safe multi-vehicle autonomy and complex tasks.This research project will investigate how distributional reinforcement learning can support advanced autonomy capabilities for USVs. Firstly, we will use Distributional RL to learn a diverse portfolio of individual policies capable of safely executing complex tasks in dynamic, unstructured, and uncertain environments, characterized by harsh disturbances, minimalistic sensing, and degraded perception. In such settings, distributional RL offers thepotential for improved performance and increased safety. Secondly, one layer above the aforementioned task execution policies, we will establish the infrastructure for centralized mission planning and task allocation to address complex missionsrequiring multiple USVs, alongside decentralized task execution compatible with low-bandwidth and intermittent communications. Whena deployment of USVs is sufficiently large, our planning layer will also manage team formation, and tasks will be allocated hierarchically to teams of USVs based on their physical locations. Thirdly, to support our development, testing and validation of the abovecapabilities, experiments will be performed with different levels of fidelity, with a #Sim2Sim2Real# approach that encompasses a low-fidelity simulator, a high-fidelity simulator, and physical experiments with real USVs.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Nov 09, 2024
- Source ID
- N000142412522
Entities
People
- Brendan Englot
Organizations
- Office of Naval Research
- Stevens Institute of Technology
- United States Navy