Learning Spatial and Temporal Inference Machines

Abstract

Statement of Work:The PI plans to investigate policy gradient, policy iteration, and value function methods for deep reinforcement learning to reduce the need for huge amounts of data needed for training deep networks. This work will be for developing robust and tight integration of perception-action loop for robots. See "Approach" for a more detailed description.Objective:Develop methods for training deep reinforcement learning that need only a modest amount of data for tight integration of perception-action loop for autonomous agents. Approach:Reinforcement learning considers the problem of learning control policies through a directed process of trial and error. It has had a moderate number of success stories in robotics. However, the true promise of reinforcement learning is far from realized. Existing success stories required substantial domain expertise to craft good control policy architectures with a relatively small number of free parameters such that these parameters can be learned from a limited number of trials. The current status of reinforcement learning for robotic control is strikingly similar to the status of computer vision before deep learning. Deep here refers to the fact that a multi-layer architecture is being learned, as opposed to a flat architecture like a support vector machine. Before deep learning substantial expertise had to go into developing feature descriptors of images (the analog of developing control architectures) that would then be fed into a support vector machine to train object classification and detection algorithms. In contrast, deep learning approaches directly optimize a deep, layered mapping from pixels to the predicted class label. Key enablers for the success of deep learning in computer vision were access to large amounts of labeled data and the algorithms and computational cycles to efficiently optimize the billions of free parameters of these deep networks. Reinforcement learning problems are more challenging because the supervisory signal is less direct: an action taken now could only see its pay-off (or detrimental effect) happen several time steps later. Despite the additional challenges, the PI believes in the next few years to be able to develop advances in deep reinforcement learning that will result in a significant leap forward in robotic capabilities, very similar to how deep learning has significantly advanced computer vision and speech recognition.Preliminary results on deep reinforcement learning have been promising. Recent work by researchers at DeepMind has shown that with a large amount of simulation time it is possible to train a deep neural net with tens of thousands of parameters that maps directly from current pixel values to actions in the game. A promising direction, pursued under the PI~s ONR-YIP, is to increase the strength of the supervisory signal through guided policy search, which is applicable whenever it is possible to solve very specific instances of the control problem (e.g., placing a block in a very specific slot). The solutions to a set of these very specific problems can then be used to supervise the training of a neural net control policy, which has the ability to generalize to new situations. It is capable of learning neural network controllers with thousands of parameters~a significant advance over the few dozen parameters learned in the aforementioned reinforcement learning success stories. The PI~s work extended guided policy search to become applicable when no reliable simulation model of the environment is available, which is particularly relevant in robotic manipulation where the complexity of contact forces tends to complicate reliable simulation, especially in the presence of deformation. This updated version of guided policy search has already enabled learning a variety of manipulation primitives, such as screwing caps onto bottles and some simple assembly tasks. Under the ONR-YIP, the PI is investigating extensions of guided policy

Document Details

Document Type: DoD Grant Award
Publication Date: Jan 25, 2017
Source ID: N000141512730

Entities

People

Pieter Abbeel

Organizations

Office of Naval Research
United States Navy
University of California Regents

Learning Spatial and Temporal Inference Machines

Abstract

Document Details

Entities

People

Organizations

Tags

Fields of Study

Readers

Technology Areas