Exploration and Policy Reuse

Abstract

The authors define Policy Reuse as a learning technique that is guided by past policies and that offers the challenge of balancing three choices: exploitation of the ongoing learned policy, exploration of random actions, and exploration towards the past policies. In this work, they introduce a new exploration strategy, pi-reuse, as an intelligent bias to reuse a past policy when learning a new one. Interestingly, this strategy also provides a similarity metric among a set of past policies and the new one. The authors therefore define a pi-reuse-based similarity metric between policies. They introduce a new algorithm that combines the selection and reuse of past policies using this similarity metric. They then show empirical results that demonstrate the usefulness of their exploration strategy, pi-reuse, as an intelligent bias to reuse past policies, and its effectiveness in defining the similarity between policies.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jul 01, 2005
Accession Number
ADA456807

Entities

People

  • Fernando Fernandez
  • Manuela M. Veloso

Organizations

  • Carnegie Mellon University

Tags

Communities of Interest

  • Autonomy

DTIC Thesaurus Topics

  • Air Force
  • Algorithms
  • Autonomous Navigation
  • Collision Avoidance
  • Computer Science
  • Education
  • Equations
  • Learning
  • Motion Planning
  • Probability
  • Random Variables
  • Reinforcement Learning
  • Robot Navigation
  • Robots
  • Standards
  • Supervisors
  • Transitions

Fields of Study

  • Computer science

Readers

  • Economics
  • Neural Network Machine Learning.
  • Systems Analysis and Design