NeuroSymbolic Hierarchical Reinforcement Learning
Abstract
Recent advances in deep reinforcement learning (DRL) have led to strong performance in many domains including Atari games, Mujoco environments and complex 2-person games like Go and chess. Despite these impressive successes, there are substantial obstacles to the application of DRL to many real world applications such as personal robots, medical informatics, logistics, and construction engineering. In this proposal, we identify several limitations of DRL and propose a synergistic integration of DRL with relational and hierarchical representations, and symbolic planning and inference algorithms that overcome these limitations. One of the biggest limitations of DRL approaches is that they are based on propositional rep- resentations of states and actions, i.e., as constant-sized vectors or grids and actions at the same temporal scale. However most real world domains involve open-ended sets of objects, relationships between objects, and actions at multiple temporal scales, which are best modeled by relational and hierarchical representations. Lack of expressive representations inhibits generalization, which is one key reason why DRL requires exorbitant amounts of training. While it is possible to provide such training in closed domains like chess and Go which do not require supervision, open worlds require more sample-efficient approaches that exploit the modularity and hierarchy of tasks. A related weakness in most existing approaches to DRL is that they produce purely reactive policies. It is unrealistic to think that a purely reactive policy, or universal plan with no run- time inference, could be learned for large and complex domains. The closest systems that exhibit both reactive policies and run-time inference are AlphaGo, AlphaZero, and their derivatives. However, the inference in these systems reduces to look-ahead search in a closed and flat search space and neither scales to nor exploits the structure in open worlds where the set of objects and relationships evolve over time. A final weakness of DRL systems is that their decisions are not interpretable by humans since they do not share human vocabulary or reasoning methods. This in turn leads to lack of trust, which is especially important in critical applications. The proposal aims to advance AI to the next level of development by building agents that are more flexible, robust, adaptive, transparent and trustworthy. Our main vehicle to achieve these ambitious goals is to combine the recent advances in deep neural networks with relational, sym- bolic representations pioneered in classical AI to reap the benefits of both paradigms. We will develop multiple approaches to achieve this symbiosis including deep relational actor critic, model- augmented deep relational RL, deep relational transfer learning, and relational hierarchical deep RL. In each case, the objective is to learn representations of domain models and control knowledge that can support decision making, planning, transfer across related domains, and explanation in an interpretable language.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Sep 08, 2022
- Source ID
- W911NF2210251
Entities
People
- Ronald Parr
Organizations
- Army Contracting Command
- Duke University
- United States Army