Verifiable Reinforcement Learning in Networked Systems

Abstract

The objective of this proposal is to develop theory and algorithms for verifiably safe reinforcement learning in networked autonomous systems that operate in dynamic, uncertain and possibly adversarial environments. Integration of autonomous systems at scale hinges on factors including (i) how capable they are in delivering complicated missions in dynamic and a priori unknown environments and (ii) how we can establish trust in that th y will operate safely and correctly. These two factors are partly in conflict: The former requires adapting to and learning new skills in unpredictable, dynamic environments. On the other hand, we lack the means to formally characterize and systematically design for provably safe and correct behavior of such adapting and learning systems. We address this gap in our ability in developing adaptable yet provably correct autonomous systems through a merger between reinforcement learning and formal methods. Both disciplines have strengths and weaknesses. Formal methods offer clear semantics for reasoning about correctness (e.g., in probabilistic temporal logic) while the design artifacts are for pre-specified operation contexts and may be fragile under contextual changes. On the other hand, learning-based algorithms introduce adaptation to contextual changes and incompleteness of information at design time. The approach we propose will enable reinforcement learning and formal methods to offset each other s weaknesses while retaining their strengths. The proposed approach is based on several principled steps. We first formalize safety and correctness in terms of unambiguous and rich specifications in probabilistic temporal logic. We augment the learning process with a run-time shield that is responsible for monitoring the decisions by the learning algorithm with respect to the satisfaction of the specifications and for correcting these decisions only when necessary. Even if the system evolution as directed by the learning algorithm itself may not meet the safety and correctness requirements, the evolution of the augmented system will. Furthermore, the augmented system will preserve the desirable properties such convergence and optimality (more precisely constrained optimality) of the underlying learning algorithm. Additionally, early preliminary empirical studies indicate a significant improvement in data efficiency due to shielding in reinforcement learning. The focus of the proposed effort is distributed shielding for reinforcement learning in networked autonomous systems. To this end, we partition the proposed effort into three complementing thrusts along with a cross-cutting demonstration element: Thrust I -- Shielding in environments with both stochastic and nondeterministic elements Thrust II -- Shielding in networked systems Thrust III -- Compositional synthesis of distributed shields Case studies and demonstrations The expected outcomes have the potential to make timely contributions to Armyƕs mission by shortening the time needed for the deployment of autonomous systems in the field and increasing their mission effectiveness.

Document Details

Document Type
DoD Grant Award
Publication Date
Jul 09, 2020
Source ID
W911NF2010140

Entities

People

  • Ufuk Topcu

Organizations

  • Army Contracting Command
  • United States Army
  • University of Texas at Austin

Tags

Fields of Study

  • Computer science

Readers

  • Agent-Based Social Robotics and Mobile-Assisted Learning in Virtual Environments.
  • Distributed Systems and Data Platform Development
  • Neural Network Machine Learning.

Technology Areas

  • AI & ML
  • AI & ML - Autonomous Systems
  • Autonomy
  • Autonomy - Autonomous System Control