Better Reinforcement Learning with Online Representation Discovery and Sample Efficient Learning

Abstract

A fundamental goal of artificial intelligence is to create agents make good decisions as they interact with astochastic environment. The opportunity for impact of such agents has never been greater, as telecommunications and the internet of things open the potential for agents that learn and assist us throughout the day and across our lives. Such agents could scale the benefits and powers of individualized tutors to enhance navy recruit training, provide personalized patient treatment recommendations, and guide infrastructure maintenance. Many of these opportunities come with an associated responsibility: the impact of poor decisions can be severe, causing harm to people, substantial loss of revenue or damage to expensive equipment. My proposed work tackles some of the key foundational issues required to learn to make good decisions in such environments. Representation Learning for Effective Reinforcement Learning. Feature selection and representation discovery is a critical part of machine learning and control. In contrast to past efforts that have focused primarily on well estimating the value function of a fixed policy, I will develop new approaches for automatically learning to select features and representations specifically targeted at supporting good policy decision outcomes and high sample efficiency. Our work will center on two subtasks. In the first we will develop efficient Bayesian clustering methods for clustering together similar atomic states in order to share data and speed reinforcement learning. Compared to generic Bayesian clustering which is typically extremely expensive, we will introduce an agglomerative method with low computational cost that still guarantees asymptotic optimality. Our second thrust will focus on performing adaptive, incremental feature selection for feature-valued spaces to improve sample efficiency by constructing optimistic estimates of the state action values from coarser feature spaces to use as upper bounds for more refined feature spaces, as required to achieve high performance. Highly Data Efficient Reinforcement Learning. Though slow reinforcement learning methods are forgivable in domains where computer simulation is cheap and many rounds of poor performance is tolerable, this is not tenable in many important applications. Robotics has had some impressive success on sample efficient reinforcement learning methods, but such approaches typically require the underlying transitions between states to be deterministic given the selected action. In contrast, it remains very hard to build accurate models of people (such as students) and stochastic models are therefore the norm. I propose to build on these recent advances in robotics using Gaussian Processes to efficiently share data among different policies, but extend these approaches to stochastic domains and be more sample efficient through leveraging the reward decomposition structure of the tasks. A key motivation for these challenges comes from my work on computer supported education, and our proposed task will be to evaluate the algorithms created in the first two thrusts in a self-optimizing tutoring system. In doing so, our work will both contribute to fundamental machine learning knowledge and also advance the cutting edge of research in intelligent tutoring systems and educational data mining.

Document Details

Document Type: DoD Grant Award
Publication Date: Jun 03, 2016
Source ID: N000141612241

Entities

People

Emma Brunskill

Organizations

Massachusetts Institute of Technology
Office of Naval Research
United States Navy

Better Reinforcement Learning with Online Representation Discovery and Sample Efficient Learning

Abstract

Document Details

Entities

People

Organizations

Tags

Fields of Study

Readers

Technology Areas