Towards High-confidence Reinforcement Learning Algorithms for Safety-critical Systems via Dual Proce

Abstract

While the recent progress on safe reinforcement learning (RL) algorithms has been promising, the most successful (deep) RL agents ar,e far from accountable algorithms that can hold the promise of precluding undesired and unsafe behaviorsin every-changing safety-cri,tical settings. Safety and performance requirements of RL based control agents are context dependent: the best control performance t,hat a system can deliver without violating its safety constraints depends on the complexity and noveltyof the situation. On one hand,, high-demanding performance specifications in a surprising context could make the RL agent vulnerable. On the other hand, a worst-c,ase based robust-to-the-context safe control design framework may result in poor performance and mighteven jeopardize existence of a, feasible solution. Systematic integration of proactive-reactive learning mechanisms to co-learn bothspecifications and control poli,cies is of vital importance to deliver as much performance as possible while respecting system s safety specifications. Towards acco,untable RL algorithms, novel rapprochement between model-based RL with modelfree RL algorithms willbe presented to co-design system, s specifications and control policy. Safety-aware system modeling approaches will be proposed thatpreserve the invariant properties, of the actual system. The model will then be leveraged by both a proactive conflict resolving module to proactively shape RL agent ,s specification and a reactive safety certifier to myopically certify the safety of its action. Safety certifiers that use coarse dy,namics models without invariantpreserving guarantees typically induce controllers that are not accountable when applied to the true, system because they either 1) lead to overly conservative and possibly infeasible control designstrategies to guarantee safe set i,nvariance under the worst-case uncertainty realization, or 2) behave unsafely due to misrepresentation of geometric properties of th,e learned system, e.g., its safe level sets. Fully data-based performance certificates will be proposed to turn a large dataset of p,ast samples into performance awareness engines that can then be leveraged to detect abnormal system behavior. The proactive controll,er will then use this certification to trigger shaping performance specifications as scenarios develop and thus provide a formal fra,mework to guarantee provably correct RL algorithms with continued viability assurance despite changes and uncertainties. This ONR aw,ard supported research will pave the way to change the very nature of the autonomous control design for safety-critical systems, shi,fting from extrinsic designer-center goal assignment to intrinsic controller-centered goal assignment and will allow the system to i,ncrease performing the range of unforeseen tasks with high performance and in a safe and viable fashion. If successful, the PIs will, take a first but big step in designing accountable RL algorithms, even though it may remain a persistent challenge for the field.

Document Details

Document Type
DoD Grant Award
Publication Date
Feb 08, 2022
Source ID
N000142212159

Entities

People

  • Bahare Kiumarsi Khomartash

Organizations

  • Michigan State University
  • Office of Naval Research
  • United States Navy

Tags

Fields of Study

  • Computer science

Readers

  • Cybersecurity.
  • Neural Network Machine Learning.
  • Systems Analysis and Design

Technology Areas

  • AI & ML
  • AI & ML - Autonomous Systems
  • AI & ML - Machine Learning Algorithms