Exploiting Multi-Step Sample Trajectories for Approximate Value Iteration

Abstract

Approximate value iteration methods for reinforcement learning (RL) generalize experience from limited samples across large state-action spaces. The function approximators used in such methods typically introduce errors in value estimation which can harm the quality of the learned value functions. We present a new batch-mode, off-policy, approximate value iteration algorithm called Trajectory Fitted Q-Iteration (TFQI). This approach uses the sequential relationship between samples within a trajectory, a set of samples gathered sequentially from the problem domain, to lessen the adverse influence of approximation errors while deriving long-term value. We provide a detailed description of the FTQI approach and an empirical study that analyzes the impact of our method on two well-known RL benchmarks. Our experiments demonstrate this approach has significant benefits including: better learned policy performance, improved convergence, and some decreased sensitivity to the choice of function approximation.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Sep 01, 2013
Accession Number
ADA617226

Entities

People

  • Lei Yu
  • Philip Dexter
  • Robert Wright
  • Steven Loscalzo

Organizations

  • Air Force Research Laboratory

Tags

Communities of Interest

  • Autonomy
  • Engineered Resilient Systems

DTIC Thesaurus Topics

  • Air Force
  • Air Force Research Laboratories
  • Algorithms
  • Artificial Intelligence Computing
  • Availability
  • Convergence
  • Data Science
  • Demonstrations
  • Genetic Algorithms
  • Governments
  • Iterations
  • Learning
  • Machine Learning
  • Neural Networks
  • Reinforcement Learning
  • Sensitivity
  • Trajectories

Fields of Study

  • Computer science

Readers

  • Calculus or Mathematical Analysis
  • Computational Modeling and Simulation
  • Neural Network Machine Learning.

Technology Areas

  • AI & ML
  • AI & ML - Machine Learning Algorithms
  • Space