Robots That Learn On The Job
Abstract
End-to-end robotic learning via reinforcement learning can enable robots to autonomously acquire complex skills directly through unsupervised interaction with the real world. This is a very powerful capability: robots that can get better and better through experience can be deployed directly in complex real-world environments and, even if they do not immediately possess the requisitelevel of competence at the task they are meant to perform, they will acquire that competence on their own. When the environment changes, the robot will adapt. When the task changes, the robot will adapt. And through all this, it will keep getting better and better. Unfortunately, the full promise of robotic reinforcement learning has proven difficult to realize: although the reinforcement learning process itself is in principle automatic, it requires two components to be providedmanually by the user: a reward function and, more subtly, the ability to reset the environment between trials so that the robot can try a task repeatedly. In practice, both of these components are very difficult to provide for robots that need to learn skills directly in the real world. These challenges represent the primary bottlenecks that prevent robots from achieving the perpetual improvementthat in principle should be possible via reinforcement learning. The goal of this project is to alleviate this challenge, making it easier for human users to specify the task (reward function) through intuitive communication modalities, and removing the requirement for the robot to have access to a manually designed reset mechanism during learning.Technical Approach. This research project will consist of two main technical thrusts: learning reward functions from different modalities and examples, and automating the reset process via bidirectional reinforcement learning. These two technical thrusts will be complemented by an experimental thrust aimed at a real-world experimental validation of the methods.Thrust 1: Learning reward functions. We will study (1) learning rewards from demonstrations, (2) learning rewards from outcome examples, (3) acquiring rewards from language. The aim will be to enable users with minimal training to define learning goals for RL without manual programming.Thrust 2: Automating the reset process. To remove the need for manually designed reset mechanisms,we will study bidirectional reinforcement learning, where a robot simultaneously learns both to perform a task and to put things back the way they were, thus enabling it to try again.Thrust 3: Experimental robotics framework. We will complement our algorithmic developments with the development of a real-world robotic learning framework that integrates these methods, evaluated on a range of realistic robotic manipulation and mobility scenarios.Anticipated Outcome. The proposed research will advance the state of the art in robotic learning, enabling robots to autonomously improve through experience on the job, with minimal additional instrumentation, and using objectives specified by humans via intuitively modalities.Impact on DoN Capabilities. Autonomous acquisition of generalizable robotic skills provides for a range of applications with high DoN impact. Examples include autonomous repair and maintenance of equipment in the field, automation of warehousing, and a variety of other autonomous robotic missions in unstructured natural environments.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- May 08, 2020
- Source ID
- N000142012383
Entities
People
- Sergey Levine
Organizations
- Office of Naval Research
- United States Navy
- University of California Regents