Probabilistic Reuse of Past Policies

Abstract

A past policy provides a bias to guide the exploration of the environment and speed up the learning of a new action policy. The success of this bias depends on whether the past policy is "similar" to the actual policy or not. In this report, the authors describe a new algorithm, PRQ-Learning, that reuses a set of past policies to bias the learning of a new one. The past policies are ranked following a similarity metric that estimates how useful it is to reuse each of those past policies. This ranking provides a probabilistic bias for the exploration in the new learning process. Several experiments demonstrate that PRQ-Learning finds a balance between exploitation of the ongoing learned policy, exploration of random actions, and exploration toward the past policies.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jul 01, 2005
Accession Number
ADA456806

Entities

People

  • Fernando Fernandez
  • Manuela M. Veloso

Organizations

  • Carnegie Mellon University

Tags

Communities of Interest

  • Autonomy

DTIC Thesaurus Topics

  • Abstracts
  • Air Force
  • Algorithms
  • Collision Avoidance
  • Computer Science
  • Education
  • Equations
  • Information Operations
  • Iterations
  • Learning
  • Motion Planning
  • Obstacle Avoidance Systems
  • Personal Information Managers
  • Probability
  • Random Variables
  • Reinforcement Learning
  • Robots

Fields of Study

  • Computer science

Readers

  • Economics
  • Environmental Impact Assessment (EIA) of Proposed Air Force Base Actions.
  • Statistical inference.