Using Strong Convergence to Accelerate Value Iteration.

Abstract

Convergence of the relative value function (total value function less total value of a base state) and the optimal policy in length of the horizon for value iteration (markov decision programming) have been recently shown to be geometric with factor alpha beta. Here alpha is the discount factor, and beta < or = 1.0. The case beta < 1.0 is termed 'strong convergence'. It is suggested in this paper that bounds on the convergence rate be estimated computationally during the value iteration process, giving bounds directly on the extrapolated infinite horizon relative value function. Such an extrapolation has two purposes. First, large numbers of iterations in value iteration can be skipped by continuing computation directly with the estimated infinite horizon relative value function (this is directly analogous to quadratic acceleration procedures in non-linear programming). Second, existing procedures for the elimination of non-optimal actions are greatly strengthened, since actions can be eliminated permanently once bounds on the infinite horizon relative value function become sufficiently tight.

Document Details

Document Type
Technical Report
Publication Date
Mar 01, 1976
Accession Number
ADA025604

Entities

People

  • Thomas E. Morton

Organizations

  • Carnegie Mellon University

Tags

Communities of Interest

  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Computations
  • Computer Programming
  • Convergence
  • Elimination
  • Extrapolation
  • Iterations
  • Linear Programming
  • Mathematical Analysis
  • Mathematics
  • Numerical Analysis

Readers

  • Finite Element Method (FEM) for solving Partial Differential Equations (PDEs)
  • Mathematical Modeling and Probability Theory.
  • Statistical inference.