On the Convergence of Stochastic Iterative Dynamic Programming Algorithms

Abstract

Recent developments in the area of reinforcement learning have yielded a number of new algorithms for the prediction and control of Markovian environments. These algorithms, including the TD lambda) algorithm of Sutton (1988) and the Q-learning algorithm of Watkins (1989), can be motivated heuristically as approximations to dynamic programming (DP). In this paper we provide a rigorous proof of convergence of these DP-based learning algorithms by relating them to the powerful techniques of stochastic approximation theory via a new convergence theorem. The theorem establishes a general class of convergent algorithms to which both TD(lambda) and Q-learning belong. reinforcement learning, Stochastic approximation, Convergence, Dynamic programming.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Aug 06, 1993
Accession Number: ADA276517

Entities

People

Michael I. Jordan
Satinder P. Singh
Tommi S. Jaakkola

Organizations

Massachusetts Institute of Technology

On the Convergence of Stochastic Iterative Dynamic Programming Algorithms

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Fields of Study

Readers

Technology Areas