Convergence Behavior of Temporal Difference Learning.
Abstract
Temporal difference learning is an important class of incremental learning procedures which learn to predict outcomes of sequential processes through experience. Although these algorithms have been used in a variety of notorious intelligent systems such as Samuel's checker-player and Tesauro's Backgammon program. Their convergence properties remain poorly understood. This paper provides a brief summary of the theoretical basis for these algorithms and documents observed convergence performance in a variety of experiments. The implications of these results are also briefly discussed.
Document Details
- Document Type
- Technical Report
- Publication Date
- May 01, 1996
- Accession Number
- ADA318671
Entities
People
- Raj P. Malhotra