Convergence Behavior of Temporal Difference Learning.

Abstract

Temporal difference learning is an important class of incremental learning procedures which learn to predict outcomes of sequential processes through experience. Although these algorithms have been used in a variety of notorious intelligent systems such as Samuel's checker-player and Tesauro's Backgammon program. Their convergence properties remain poorly understood. This paper provides a brief summary of the theoretical basis for these algorithms and documents observed convergence performance in a variety of experiments. The implications of these results are also briefly discussed.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
May 01, 1996
Accession Number
ADA318671

Entities

People

  • Raj P. Malhotra

Tags

Communities of Interest

  • Autonomy
  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Abstracts
  • Air Force
  • Algorithms
  • Artificial Intelligence
  • Availability
  • Classification
  • Computations
  • Convergence
  • Intelligent Systems
  • Learning
  • Machine Learning
  • Markov Processes
  • Probability
  • Random Variables
  • Random Walk
  • Reinforcement Learning
  • Supervised Machine Learning

Readers

  • Neural Network Machine Learning.
  • Theoretical Analysis.