Using Strong Convergence to Accelerate Value Iteration.

Abstract

Convergence of the relative value function (total value function less total value of a base state) and the optimal policy in length of the horizon for value iteration (markov decision programming) have been recently shown to be geometric with factor alpha beta. Here alpha is the discount factor, and beta < or = 1.0. The case beta < 1.0 is termed 'strong convergence'. It is suggested in this paper that bounds on the convergence rate be estimated computationally during the value iteration process, giving bounds directly on the extrapolated infinite horizon relative value function. Such an extrapolation has two purposes. First, large numbers of iterations in value iteration can be skipped by continuing computation directly with the estimated infinite horizon relative value function (this is directly analogous to quadratic acceleration procedures in non-linear programming). Second, existing procedures for the elimination of non-optimal actions are greatly strengthened, since actions can be eliminated permanently once bounds on the infinite horizon relative value function become sufficiently tight.

Document Details

Document Type: Technical Report
Publication Date: Mar 01, 1976
Accession Number: ADA025604

Entities

People

Thomas E. Morton

Organizations

Carnegie Mellon University

Using Strong Convergence to Accelerate Value Iteration.

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Readers