Exploration and Policy Reuse

Abstract

The authors define Policy Reuse as a learning technique that is guided by past policies and that offers the challenge of balancing three choices: exploitation of the ongoing learned policy, exploration of random actions, and exploration towards the past policies. In this work, they introduce a new exploration strategy, pi-reuse, as an intelligent bias to reuse a past policy when learning a new one. Interestingly, this strategy also provides a similarity metric among a set of past policies and the new one. The authors therefore define a pi-reuse-based similarity metric between policies. They introduce a new algorithm that combines the selection and reuse of past policies using this similarity metric. They then show empirical results that demonstrate the usefulness of their exploration strategy, pi-reuse, as an intelligent bias to reuse past policies, and its effectiveness in defining the similarity between policies.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Jul 01, 2005
Accession Number: ADA456807

Entities

People

Fernando Fernandez
Manuela M. Veloso

Organizations

Carnegie Mellon University

Exploration and Policy Reuse

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Fields of Study

Readers