PECASE: Gradients and Hierarchy in Deep Reinforcement Learning
Abstract
Short Work Statement:Develop deep hierarchical reinforcement learning for integration of perception and control for complex, long-duration tasks where rewards are delayed.Objective:Investigate reinforcement learning for integration of perception and control and develop algorithms for learning complex, long-duration tasks.Approach:The PI will investigate reinforcement learning, particularly in cases where rewards have substantial delays. The investigation domain is primarily for integration of perception and control. Reinforcement learning for problems that have substantial reward delays is generally intractable. The PI proposes to develop deep hierarchical architectures to make these problems tractable. They will also investigate if such hierarchical architectures would make learning of newtasks easier. The other aspect of this proposal is developing methods for computing policy gradients that are at the core of many learning algorithms including reinforcement learning, deep neural nets, LSTM networks and others. They will investigate the stochastic computation graphs formalism for computing policy gradients. Merit/Relevance:This effort is related to ONR~s Autonomy focus area, as well as Information Dominance focus area. This effort willcontribute to building capable robots that can perform complex tasks that require long durations. This work is expected to result in tractable algorithms for reinforcement learning for problems with substantial reward delays, and fast and accurate methods for computing policy gradients used in a wide variety of learning algorithms.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Aug 12, 2016
- Source ID
- N000141612723
Entities
People
- Pieter Abbeel
Organizations
- Office of Naval Research
- United States Navy
- University of California Regents