Non-Markovian, Adversarial, and Multi-objective Reinforcement Learning Algorithms

Abstract

Reinforcement learning (RL) has achieved impressive results on many problems, such as natural language processing, games, vision tasks. In a typical RL formulation, the aim is to learn a control policy that maximizes the cumulative reward when operating in an unknown environment. Although a wide variety of model-based and model-free RL algorithms are now available for controlling physical systems operating in complex real-life environments, RL requires higher-level reasoning and cognition abilities. Current RL lacks stringent safety and robustness guarantees and the ability to consider multi-faceted objectives and higher order memory interactions, usually required in real-life applications, such as autonomous aerial/ground/underwater vehicles and cyberphysical systems. By addressing these challenges, this project will push the envelope of RL to control autonomous systems in complex uncertain environments through the following three novel contributions: (1) To overcome a fundamental limitation in RL Ð the reliance on Markovian assumptions and the adoption of memoryless strategy that ignores historic experience characteristic in human learning and cognition, we propose a non-Markovian model-based RL framework that exploits concepts from fractional dynamical systems to capture the degree of long-range dependence among the various variables and multi-faceted objectives of an RL problem. Instead of the usual RL assumption that the probability distribution over the next state is dependent only on the current state, we consider that various fractional order derivatives encode the history and long-range memory of various RL variables and estimate the fractal coefficients of these fractional operators from data in real time. In addition, these fractional order derivatives modulate the time-dependent dependencies among RL variables and objectives capturing their degree of nonlinearity. This fractal and fractional calculus formulation of non-Markovian model-based RL allows a more compact representation capable of not only capturing the complexity of various real-world applications but also leading to a more data and memory efficient control strategy. (2) To tackle the data complexity, we develop novel design strategies of RL algorithms that are robust to perturbations in the data during the deployment phase. Machine learning in general, and RL in particular, is known to be very fragile against adversary-induced perturbations in training or test data. We will utilize the system theoretic notion of dissipativity in the context of RL to ensure that the system properties such as stability remain satisfied even under perturbation of data during test time. Further, we will also design a control barrier function-based RL algorithm to ensure robust satisfaction of safety constraints in the presence of adversarial perturbations. (3) Traditional RL strategies maximize a scalar utility function. However, the autonomous systems are required to deal with multiple cost functions; combining them into a scalar function leads to suboptimal situations. To tackle this challenge, we will develop two approaches to allow for consideration of vector-valued utility functions in multi-objective RL. For the case when all the components of the utility vector can be simultaneously observed, we develop multi-objective RL algorithms using distributional RL that can optimize the policy to any given preference distribution, leading to Pareto optimal algorithms. For the case when a possibly incorrect linear combination of the components of the utility vector can be observed, we develop algorithms that lead to guarantees on suboptimality with respect to an RL algorithm that optimizes only one objective at a time.

Document Details

Document Type: DoD Grant Award
Publication Date: Mar 08, 2023
Source ID: W911NF2310111

Entities

People

Paul Bogdan

Organizations

Army Contracting Command
United States Army
University of Southern California

Non-Markovian, Adversarial, and Multi-objective Reinforcement Learning Algorithms

Abstract

Document Details

Entities

People

Organizations

Tags

Fields of Study

Readers

Technology Areas