Multi-Agent Residual Advantage Learning with General Function Approximation.

Abstract

A new algorithm advantage learning, is presented that improves on advantage updating by requiring that a single function be learned rather than two. Furthermore, advantage learning requires only a single type of update, the learning, while advantage updating requires two different types of updates, a learning update and a normalization update. The reinforcement learning system uses the residual form of advantage learning. An application of reinforcement learning to a Markov game is presented. The test-bed has continuous states and nonlinear dynamics. The advantage function is stored in a single-hidden-layer sigmoidal network. Speed of learning is increased by a new algorithm, Incremental Delta-Delta (IDD), which extends Jacob's (1988) Delta-Delta for use in incremental training, and differs from Sutton's Incremental Delta-Bar-Delta (1992) in that it does not require the use of a trace and is amenable for use with general function approximation systems. To our knowledge, this is the first time an approximate second order method has been used with residual algorithms. Empirical results are presented comparing convergence rates with and without the use of lDD for the reinforcement learning test-bed and for a supervised learning test-bed.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Apr 03, 1996
Accession Number: ADA309171

Entities

People

Leemon C. Baird Iii
Mance E. Harmon

Organizations

Wright Laboratory

Multi-Agent Residual Advantage Learning with General Function Approximation.

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Fields of Study

Readers

Technology Areas