Exploiting Multi-Step Sample Trajectories for Approximate Value Iteration

Abstract

Approximate value iteration methods for reinforcement learning (RL) generalize experience from limited samples across large state-action spaces. The function approximators used in such methods typically introduce errors in value estimation which can harm the quality of the learned value functions. We present a new batch-mode, off-policy, approximate value iteration algorithm called Trajectory Fitted Q-Iteration (TFQI). This approach uses the sequential relationship between samples within a trajectory, a set of samples gathered sequentially from the problem domain, to lessen the adverse influence of approximation errors while deriving long-term value. We provide a detailed description of the FTQI approach and an empirical study that analyzes the impact of our method on two well-known RL benchmarks. Our experiments demonstrate this approach has significant benefits including: better learned policy performance, improved convergence, and some decreased sensitivity to the choice of function approximation.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Sep 01, 2013
Accession Number: ADA617226

Entities

People

Lei Yu
Philip Dexter
Robert Wright
Steven Loscalzo

Organizations

Air Force Research Laboratory

Exploiting Multi-Step Sample Trajectories for Approximate Value Iteration

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Fields of Study

Readers

Technology Areas