On the Convergence of Stochastic Iterative Dynamic Programming Algorithms

Abstract

Recent developments in the area of reinforcement learning have yielded a number of new algorithms for the prediction and control of Markovian environments. These algorithms, including the TD lambda) algorithm of Sutton (1988) and the Q-learning algorithm of Watkins (1989), can be motivated heuristically as approximations to dynamic programming (DP). In this paper we provide a rigorous proof of convergence of these DP-based learning algorithms by relating them to the powerful techniques of stochastic approximation theory via a new convergence theorem. The theorem establishes a general class of convergent algorithms to which both TD(lambda) and Q-learning belong. reinforcement learning, Stochastic approximation, Convergence, Dynamic programming.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Aug 06, 1993
Accession Number
ADA276517

Entities

People

  • Michael I. Jordan
  • Satinder P. Singh
  • Tommi S. Jaakkola

Organizations

  • Massachusetts Institute of Technology

Tags

Communities of Interest

  • Autonomy
  • C4I
  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Algorithms
  • Artificial Intelligence
  • Cognitive Science
  • Computer Programming
  • Convergence
  • Dynamic Programming
  • Environment
  • Information Processing
  • Information Systems
  • Learning
  • Machine Learning
  • Markov Chains
  • Mathematical Models
  • Operations Research
  • Random Variables
  • Reinforcement Learning
  • Stochastic Processes

Fields of Study

  • Computer science

Readers

  • Neural Network Machine Learning.
  • Operations Research

Technology Areas

  • AI & ML
  • AI & ML - Machine Learning Algorithms
  • AI & ML - Neural Networks