Probabilistic Reuse of Past Policies

Abstract

A past policy provides a bias to guide the exploration of the environment and speed up the learning of a new action policy. The success of this bias depends on whether the past policy is "similar" to the actual policy or not. In this report, the authors describe a new algorithm, PRQ-Learning, that reuses a set of past policies to bias the learning of a new one. The past policies are ranked following a similarity metric that estimates how useful it is to reuse each of those past policies. This ranking provides a probabilistic bias for the exploration in the new learning process. Several experiments demonstrate that PRQ-Learning finds a balance between exploitation of the ongoing learned policy, exploration of random actions, and exploration toward the past policies.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Jul 01, 2005
Accession Number: ADA456806

Entities

People

Fernando Fernandez
Manuela M. Veloso

Organizations

Carnegie Mellon University

Probabilistic Reuse of Past Policies

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Fields of Study

Readers