Advantage Updating

Abstract

A new algorithm for reinforcement learning advantage updating, is proposed. Advantage updating is a direct learning technique; it does not require a model to be given or learned. It is incremental, requiring only a constant amount of calculation per time step, independent of the number of possible actions, possible outcomes from a given action, or number of states. Analysis and simulation indicate that advantage updating is applicable to reinforcement learning systems working in continuous time (or discrete time with small time steps) for which Q-learning is not applicable. Simulation results are presented indicating that for a simple linear quadratic regulator (LQR) problem with no noise and large time steps, advantage updating learns slightly faster than Q- learning. Where there is noise or small time steps, advantage updating learns more quickly than Q-learning by a factor of more than 100,000. Convergence properties and implementation issues are discussed. New convergence results are presented for R-learning and algorithms based upon change in value. It is proved that learning rule for advantage updating converges to the optimal policy with probability one

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Nov 04, 1993
Accession Number
ADA280862

Entities

People

  • Leemon C. Baird Iii

Organizations

  • Wright Laboratory

Tags

Communities of Interest

  • Autonomy

DTIC Thesaurus Topics

  • Air Force
  • Algorithms
  • Artificial Intelligence
  • Computational Complexity
  • Computer Programming
  • Computer Science
  • Control Systems
  • Convergence
  • Dynamic Programming
  • Equations
  • Information Processing
  • Machine Learning
  • Probability
  • Regulators
  • Reinforcement Learning
  • Simulations
  • United States

Readers

  • Adaptive Control and Estimation with Uncertainty in Dynamic Systems.
  • Neural Network Machine Learning.

Technology Areas

  • AI & ML
  • AI & ML - Bayesian Inference
  • AI & ML - Machine Learning Algorithms