Oracle Imitation for Embedded Decision Making
Abstract
lanning is a pervasive problem in artificial intelligence, and with perception together form the core of artificial intelligence. Planning involves considering the sequence of actions of agent(s) in an environment over a time horizon that maximizes some long-term objective. Because planning must consider the consequences of actions many steps in the future, the possible futures to search for good solutions can quickly explode in number to millions and beyond, resulting in long wait times for good solutions. While solutions to static problems can be solved and saved offline and referenced online in order to overcome the delay in processing a plan, this practice falls apart for non-static scenarios or where information is gathered online. This project investigates a fundamentally new approach to decision making (planning) via imitation learning. The approach involves imprinting the input-output behaviors of optimal solvers and experts (oracles), into neural networks. The approach hypothesizes that by training a function approximator such as a neural network to mimic prior demonstrations from the oracle, we can learn embeddings of the problem as well as the oraclesÕ solutions and reproduce optimal solutions across both old and new problem instances. This can be done with transformatively lowered speed and computational complexity because the problem has fundamentally changed from an online search problem in high-dimensional spaces to an offline regression problem with an online line prediction that can be solved with smaller, finite computation. This idea of producing behaviors using a neural embedding was motivated by looking at how Nature encodes intelligence: the human and animal brain generates behaviors in complex dynamic environments using instinct, especially in environments that are similar to those encountered before, and not by running internal tree-search algorithms or approximate dynamic programs on-the-fly. The following proposal investigates three different thrusts: Research Thrust 1: Oracle networks for discrete graph planning: Here we will describe the problem of imitating a combinatorial optimization solver. We approach setting up discrete oracle networks to be problem-size-invariant by defining both a recurrent latent embedding function for tasks, defined by discrete sub-tasks in a graphical form. Research Thrust 2: Hierarchical planning using oracle imitation: We propose learning a nested hierarchical planner via oracle imitation. We contrast the work to traditional hierarchical planning in that we introduce composition as a technique that can solve problems that require a specific combination of sequential tasks and underlying motion primitives. Research Thrust 3: Learning-to-learn oracle networks: New problem instances always pose a challenge for neural networks to learn, as many consider new problem instances to be data-hungry to adapt to. We will investigate a meta-oracle approach, which allows oracle networks to adapt to new problem instances and environments with as few as one example from the oracle. The learned networks will be physically demonstrated on an embedded solution on off-the-shelf embedded computer hardware to show off our claimed advantages in power-limited mobile robots. This cloud-disconnected, on-board emphasis can directly address the ever-relevant challenges of autonomous agents to make decisions in communications-compromised scenarios.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Jun 25, 2021
- Source ID
- W911NF2110243
Entities
People
- Michael Yip
Organizations
- Army Contracting Command
- United States Army
- University of California, San Diego