Exploration and Policy Reuse
Abstract
The authors define Policy Reuse as a learning technique that is guided by past policies and that offers the challenge of balancing three choices: exploitation of the ongoing learned policy, exploration of random actions, and exploration towards the past policies. In this work, they introduce a new exploration strategy, pi-reuse, as an intelligent bias to reuse a past policy when learning a new one. Interestingly, this strategy also provides a similarity metric among a set of past policies and the new one. The authors therefore define a pi-reuse-based similarity metric between policies. They introduce a new algorithm that combines the selection and reuse of past policies using this similarity metric. They then show empirical results that demonstrate the usefulness of their exploration strategy, pi-reuse, as an intelligent bias to reuse past policies, and its effectiveness in defining the similarity between policies.
Document Details
- Document Type
- Technical Report
- Publication Date
- Jul 01, 2005
- Accession Number
- ADA456807
Entities
People
- Fernando Fernandez
- Manuela M. Veloso
Organizations
- Carnegie Mellon University